Nginx Pack

This is a Nginx pack that ingests logs, extracts key fields, and organizes data to facilitate targeted analysis based on HTTP status codes.

4 minute read

Edge Delta Pipeline Pack for NGINX

Overview

The NGINX pack efficiently processes NGINX logs to offer detailed insights into web server activity and performance. By ingesting logs and extracting key fields, this pack organizes data to facilitate targeted analysis based on HTTP status codes, transforming this information into meaningful metrics. It captures both successful interactions and error occurrences, supporting comprehensive monitoring and enabling proactive identification and resolution of server or client-side issues.

Pack Description

1. Data Ingestion

The data flow starts with the compound_input node. This node serves as the entry point into the pipeline, where it begins processing the incoming NGINX logs.

2. Field Extraction

Next, logs go through the grok_extract_attributes node, a Grok node. It uses the APACHE_COMBINED pattern to extract fields such as clientip, verb, request, etc., structuring them as attributes.

  - name: grok_extract_attributes
    type: grok
    pattern: '%{APACHE_COMBINED}'

This transformation enables the extraction of meaningful data, simplifying search and analysis.

3. Timestamp Transformation

Logs then reach the log_transform_timestamp node, a Log Transform node. This node updates log entries by transforming the timestamp attribute into Unix Milliseconds format and upserting it as the item timestamp.

  - name: log_transform_timestamp
    type: log_transform
    transformations:
      - field_path: item["timestamp"]
        operation: upsert
        value:
          convert_timestamp(item["attributes"]["timestamp"], "02/Jan/2006:15:04:05
          -0700", "Unix Milli")
      - field_path: item["attributes"]["timestamp"]
        operation: delete

This adjustment ensures precise timing during sorting and analyzing.

4. Status-Based Routing

Logs proceed to the status_code_router node, a Route node, which sorts logs based on HTTP status code.

  - name: status_code_router
    type: route
    paths:
      - path: success
        condition:
          int(item["attributes"]["response"]) >= 200 && int(item["attributes"]["response"])
          <= 299
        exit_if_matched: true
      - path: client_error
        condition:
          int(item["attributes"]["response"]) >= 400 && int(item["attributes"]["response"])
          <= 499
        exit_if_matched: true
      - path: server_error
        condition:
          int(item["attributes"]["response"]) >= 500 && int(item["attributes"]["response"])
          <= 599
        exit_if_matched: true

It is designed to categorize logs based on HTTP status codes, which are grouped into different categories:

Logs with HTTP status codes indicating success (200-299) are routed to the success path and not evaluated for subsequent criteria.
Logs with HTTP status codes indicating client errors (400-499) are routed to the client_error path and not evaluated for subsequent criteria.
Logs with HTTP status codes indicating server errors (500-599) are routed to the server_error path and not evaluated for subsequent criteria.

This routing allows categorization into success, client error, or server error, aiding issue prioritization.

5. Log to Metric Conversion for Successful Responses

Logs identified as successful are processed by the success_l2m node, a Log to Metric node, which transforms them into metrics.

  - name: success_l2m
    type: log_to_metric
    pattern: .*
    interval: 1m0s
    skip_empty_intervals: false
    only_report_nonzeros: false
    metric_name: http_successes
    dimension_groups:
      - field_dimensions:
          - item["attributes"]["clientip"]
          - item["attributes"]["verb"]
          - item["attributes"]["request"]
          - string(item["attributes"]["response"])
        field_numeric_dimension: item["attributes"]["bytes"]
        enabled_stats:
          - sum
      - field_dimensions:
          - item["attributes"]["clientip"]
          - item["attributes"]["verb"]
          - item["attributes"]["request"]
          - string(item["attributes"]["response"])
        enabled_stats:
          - count

The first dimension group uses the log attributes such as clientip, verb, request, and response to categorize logs and sums up the bytes attribute, providing a cumulative measure of data transfer associated with these attributes. The second dimension group employs the same set of attributes but instead focuses on counting the number of log items or occurrences.

6. Log to Metric Conversion for Client Errors

Similarly, client error logs traverse the client_error_l2m node, producing metrics for enhanced monitoring.

  - name: client_error_l2m
    type: log_to_metric
    pattern: .*
    interval: 1m0s
    skip_empty_intervals: false
    only_report_nonzeros: false
    metric_name: http_client_errors
    dimension_groups:
      - field_dimensions:
          - item["attributes"]["clientip"]
          - item["attributes"]["verb"]
          - item["attributes"]["request"]
          - string(item["attributes"]["response"])
        field_numeric_dimension: item["attributes"]["bytes"]
        enabled_stats:
          - sum
      - field_dimensions:
          - item["attributes"]["clientip"]
          - item["attributes"]["verb"]
          - item["attributes"]["request"]
          - string(item["attributes"]["response"])
        enabled_stats:
          - count

7. Log to Metric Conversion for Server Errors

Server error logs are processed by the server_error_l2m node, which also captures metrics for insight into backend problems.

  - name: server_error_l2m
    type: log_to_metric
    pattern: .*
    interval: 1m0s
    skip_empty_intervals: false
    only_report_nonzeros: false
    metric_name: http_server_errors
    dimension_groups:
      - field_dimensions:
          - item["attributes"]["clientip"]
          - item["attributes"]["verb"]
          - item["attributes"]["request"]
          - string(item["attributes"]["response"])
        field_numeric_dimension: item["attributes"]["bytes"]
        enabled_stats:
          - sum
      - field_dimensions:
          - item["attributes"]["clientip"]
          - item["attributes"]["verb"]
          - item["attributes"]["request"]
          - string(item["attributes"]["response"])
        enabled_stats:
          - count

8. Log to Pattern Transformation

Uncategorized logs pass through log_to_pattern, a Log to Pattern node, for pattern identification.

  - name: log_to_pattern
    type: log_to_pattern
    num_of_clusters: 10
    samples_per_cluster: 5
    reporting_frequency: 1m0s

9. Output Handling

success_logs: Routes successful logs.
client_error_logs: Routes client error logs.
server_error_logs: Routes server error logs.
other_logs: Routes unclassified logs for thorough coverage.
success_metrics_output: Routes metrics from success logs.
client_error_metrics_output: Routes client error metrics.
server_error_metrics_output: Routes server error metrics.
patterns_output: Routes patterns and samples.

Sample Input

127.0.0.1 - - [17/Sep/2024:01:45:33 +0000] "GET /index.html HTTP/1.1" 200 1024 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"

192.168.1.1 - - [17/Sep/2024:01:45:34 +0000] "POST /api/v1/upload HTTP/1.1" 201 5123 "-" "curl/7.68.0"