Istio Pack

This is a istio pack that ingests logs, extracts relevant data fields, and categorizes them based on HTTP status codes

4 minute read

Edge Delta Pipeline Pack for Istio

Overview

The Istio pack processes incoming HTTP logs to provide detailed insights into service mesh traffic. It ingests logs, extracts relevant data fields, and categorizes them based on HTTP status codes, enabling targeted analysis of successes, client errors, and server errors. This pack also converts logs into metrics, offering comprehensive visibility into traffic patterns and facilitating proactive monitoring and performance optimization.

Pack Description

1. Data Ingestion

The data flow starts with the compound_input node. This node serves as the entry point into the pipeline, where it begins processing the incoming HTTP logs.

2. Field Extraction

The logs move to the grok_extract_fields node, which is a Grok node. This node uses a specified pattern to extract fields such as timestamp, verb, request, response_code, etc., and structures them as attributes. This transformation makes the extracted fields easier to search, analyze, and visualize.

3. Timestamp Transformation

The log_transform_timestamp node, a Log Transform node, updates the log entries by converting the timestamp attribute to a Unix Milliseconds format and inserting it into the item["timestamp"] field. This ensures accuracy in log timing by matching the original log timestamp.

  - name: log_transform_timestamp
    type: log_transform
    transformations:
      - field_path: item["timestamp"]
        operation: upsert
        value:
          convert_timestamp(item["attributes"]["timestamp"], "2006-01-02T15:04:05.999999Z",
          "Unix Milli")

4. Status Code Based Routing

The status_code_router node is a Route node. This node routes logs based on their HTTP response_code, directing them into different paths based on the range of status codes:

Success Path: If a log entry has a response_code in the range of 200 to 299, it is considered a success. The log is routed to the success path, and the evaluation stops there (exit_if_matched: true). This means that once a match is found for the success condition, the other conditions are not evaluated.
Client Error Path: If the response_code is between 400 and 499, it indicates a client-side error. In this case, the log is routed to the client_error path, and, similar to the success path, further conditions are not evaluated once this condition matches.
Server Error Path: If the response_code falls within the range of 500 to 599, it indicates a server-side error. The log is then sent to the server_error path, and like the others, no further conditions are evaluated once this condition matches.

  - name: status_code_router
    type: route
    paths:
      - path: success
        condition:
          int(item["attributes"]["response_code"]) >= 200 && int(item["attributes"]["response_code"])
          <= 299
        exit_if_matched: true
      - path: client_error
        condition:
          int(item["attributes"]["response_code"]) >= 400 && int(item["attributes"]["response_code"])
          <= 499
        exit_if_matched: true
      - path: server_error
        condition:
          int(item["attributes"]["response_code"]) >= 500 && int(item["attributes"]["response_code"])
          <= 599
        exit_if_matched: true

5. Log Pattern Identification

The logs pass through the log_to_pattern node, which is a Log to Pattern node. This node identifies common patterns within logs and generates a summary view of these patterns by clustering similar logs together.

  - name: log_to_pattern
    type: log_to_pattern
    num_of_clusters: 10
    samples_per_cluster: 5

6. Log to Metric Conversion

Logs are then processed through nodes such as client_error_l2m, server_error_l2m, and success_l2m, which are Log to Metric nodes. Each node tracks and reports metrics related to the bytes sent, log counts, based on status codes.

  - name: client_error_l2m
    type: log_to_metric
    pattern: .*
    interval: 1m0s
    skip_empty_intervals: false
    only_report_nonzeros: false
    metric_name: http_client_errors
    dimension_groups:
      - field_dimensions:
          - item["attributes"]["downstream_remote_address"]
          - item["attributes"]["verb"]
          - item["attributes"]["request"]
          - string(item["attributes"]["response"])
        field_numeric_dimension: item["attributes"]["bytes_sent"]
        enabled_stats:
          - sum
      - field_dimensions:
          - item["attributes"]["downstream_remote_address"]
          - item["attributes"]["verb"]
          - item["attributes"]["request"]
          - string(item["attributes"]["response"])
        enabled_stats:
          - count

The pattern is set to .*, a catch-all regex pattern that matches any log line for further processing using the dimension_groups. This node processes all log entries routed to it by the preceding status code routing logic, specifically targeting log entries categorized as client errors.

The metrics are reported at a frequency defined by the interval, which is 1m0s (one minute). Metrics are aggregated over this interval.

The node configures two groups for the metrics. Each group represents a different perspective on the data, capturing specific attributes, downstream_remote_address, verb, request, and response from the log entry.

The first dimension group uses bytes_sent as a numeric dimension to calculate the sum of bytes sent within the interval.
The second dimension group does not have a numeric dimension and instead counts occurrences to determine the number of client errors over the interval.

For bytes_sent, the node records the sum statistic to evaluate cumulative data sent in response to client error requests. For client error occurrences, it records a count, which gives the total number of client error logs processed during the interval.

The metrics generated are labeled with http_client_errors, prefixed to provide a clear context that these metrics pertain to client-side HTTP errors

7. Output of Logs and Metrics

Finally, the logs and metrics are directed to appropriate compound_output nodes for storage or further processing. This ensures that logs associated with success, client_error, and server_error are captured for archival or inspection.

Sample Input

[2024-09-20T19:20:05.337Z] "DELETE /front-end/initiatives/communities/scale HTTP/2.0" 302 - via_upstream - "-" 21430 44638 10424 37972 "-" "Opera/10.32 (Windows 98; en-US) Presto/2.10.196 Version/10.00" "0d1109ca-38e0-40d0-85e1-914d65aed3f3" "httprecorder:8000" "76.166.226.4:63582" outbound|8000||httprecorder.example.svc.cluster.local 155.155.65.117:12592 57.95.234.30:37493 204.216.18.66:19328 - default

[2024-09-20T19:20:05.337Z] "PUT /communities/leading-edge/next-generation/orchestrate HTTP/2.0" 406 UO - - "-" 92933 66830 59461 29438 "-" "Opera/9.60 (Windows NT 6.2; en-US) Presto/2.9.243 Version/10.00" "6f72962e-d159-4d72-9314-2a266066b6d2" "httprecorder:8000" "98.238.186.243:16060" inbound|8000|| 217.225.0.137:63141 248.11.118.212:42801 25.111.31.212:3797 - default