Istio Pack
4 minute read
Edge Delta Pipeline Pack for Istio
Overview
The Istio pack processes incoming HTTP logs to provide detailed insights into service mesh traffic. It ingests logs, extracts relevant data fields, and categorizes them based on HTTP status codes, enabling targeted analysis of successes, client errors, and server errors. This pack also converts logs into metrics, offering comprehensive visibility into traffic patterns and facilitating proactive monitoring and performance optimization.
Pack Description
1. Data Ingestion
The data flow starts with the compound_input node. This node serves as the entry point into the pipeline, where it begins processing the incoming HTTP logs.
2. Field Extraction
The logs move to the grok_extract_fields node, which is a Grok node. This node uses a specified pattern to extract fields such as timestamp, verb, request, response_code, etc., and structures them as attributes. This transformation makes the extracted fields easier to search, analyze, and visualize.
3. Timestamp Transformation
The log_transform_timestamp node, a Log Transform node, updates the log entries by converting the timestamp attribute to a Unix Milliseconds format and inserting it into the item["timestamp"] field. This ensures accuracy in log timing by matching the original log timestamp.
- name: log_transform_timestamp
type: log_transform
transformations:
- field_path: item["timestamp"]
operation: upsert
value:
convert_timestamp(item["attributes"]["timestamp"], "2006-01-02T15:04:05.999999Z",
"Unix Milli")
4. Status Code Based Routing
The status_code_router node is a Route node. This node routes logs based on their HTTP response_code, directing them into different paths based on the range of status codes:
- Success Path: If a log entry has a
response_codein the range of200to299, it is considered a success. The log is routed to thesuccesspath, and the evaluation stops there (exit_if_matched: true). This means that once a match is found for the success condition, the other conditions are not evaluated. - Client Error Path: If the
response_codeis between400and499, it indicates a client-side error. In this case, the log is routed to theclient_errorpath, and, similar to the success path, further conditions are not evaluated once this condition matches. - Server Error Path: If the
response_codefalls within the range of500to599, it indicates a server-side error. The log is then sent to theserver_errorpath, and like the others, no further conditions are evaluated once this condition matches.
- name: status_code_router
type: route
paths:
- path: success
condition:
int(item["attributes"]["response_code"]) >= 200 && int(item["attributes"]["response_code"])
<= 299
exit_if_matched: true
- path: client_error
condition:
int(item["attributes"]["response_code"]) >= 400 && int(item["attributes"]["response_code"])
<= 499
exit_if_matched: true
- path: server_error
condition:
int(item["attributes"]["response_code"]) >= 500 && int(item["attributes"]["response_code"])
<= 599
exit_if_matched: true
5. Log Pattern Identification
The logs pass through the log_to_pattern node, which is a Log to Pattern node. This node identifies common patterns within logs and generates a summary view of these patterns by clustering similar logs together.
- name: log_to_pattern
type: log_to_pattern
num_of_clusters: 10
samples_per_cluster: 5
6. Log to Metric Conversion
Logs are then processed through nodes such as client_error_l2m, server_error_l2m, and success_l2m, which are Log to Metric nodes. Each node tracks and reports metrics related to the bytes sent, log counts, based on status codes.
- name: client_error_l2m
type: log_to_metric
pattern: .*
interval: 1m0s
skip_empty_intervals: false
only_report_nonzeros: false
metric_name: http_client_errors
dimension_groups:
- field_dimensions:
- item["attributes"]["downstream_remote_address"]
- item["attributes"]["verb"]
- item["attributes"]["request"]
- string(item["attributes"]["response"])
field_numeric_dimension: item["attributes"]["bytes_sent"]
enabled_stats:
- sum
- field_dimensions:
- item["attributes"]["downstream_remote_address"]
- item["attributes"]["verb"]
- item["attributes"]["request"]
- string(item["attributes"]["response"])
enabled_stats:
- count
The pattern is set to .*, a catch-all regex pattern that matches any log line for further processing using the dimension_groups. This node processes all log entries routed to it by the preceding status code routing logic, specifically targeting log entries categorized as client errors.
The metrics are reported at a frequency defined by the interval, which is 1m0s (one minute). Metrics are aggregated over this interval.
The node configures two groups for the metrics. Each group represents a different perspective on the data, capturing specific attributes, downstream_remote_address, verb, request, and response from the log entry.
- The first dimension group uses
bytes_sentas a numeric dimension to calculate the sum of bytes sent within the interval. - The second dimension group does not have a numeric dimension and instead counts occurrences to determine the number of client errors over the interval.
For bytes_sent, the node records the sum statistic to evaluate cumulative data sent in response to client error requests.
For client error occurrences, it records a count, which gives the total number of client error logs processed during the interval.
The metrics generated are labeled with http_client_errors, prefixed to provide a clear context that these metrics pertain to client-side HTTP errors
7. Output of Logs and Metrics
Finally, the logs and metrics are directed to appropriate compound_output nodes for storage or further processing. This ensures that logs associated with success, client_error, and server_error are captured for archival or inspection.
Sample Input
[2024-09-20T19:20:05.337Z] "DELETE /front-end/initiatives/communities/scale HTTP/2.0" 302 - via_upstream - "-" 21430 44638 10424 37972 "-" "Opera/10.32 (Windows 98; en-US) Presto/2.10.196 Version/10.00" "0d1109ca-38e0-40d0-85e1-914d65aed3f3" "httprecorder:8000" "76.166.226.4:63582" outbound|8000||httprecorder.example.svc.cluster.local 155.155.65.117:12592 57.95.234.30:37493 204.216.18.66:19328 - default
[2024-09-20T19:20:05.337Z] "PUT /communities/leading-edge/next-generation/orchestrate HTTP/2.0" 406 UO - - "-" 92933 66830 59461 29438 "-" "Opera/9.60 (Windows NT 6.2; en-US) Presto/2.9.243 Version/10.00" "6f72962e-d159-4d72-9314-2a266066b6d2" "httprecorder:8000" "98.238.186.243:16060" inbound|8000|| 217.225.0.137:63141 248.11.118.212:42801 25.111.31.212:3797 - default