AWS VPC JSON Pack

This is a pack that allows for processing of AWS VPC logs. The pack includes grokking, routing, transforming, and creating metrics.

8 minute read

Edge Delta Pipeline Pack for AWS VPC Flow

Overview

The AWS VPC Flow pipeline ensures effective processing of VPC Flow log data, providing insights into network activity and security. It filters, structures, and classifies logs based on actions and status.

Pack Description

1. Data Ingestion

The data flow starts with the compound_input node, which is a compound_input node. This node serves as the entry point into the pipeline, where it begins processing the incoming AWS VPC Flow logs.

2. Omitted Header Data Filtering

Next, logs flow into the omit_header_data node, which is a Regex Filter node. It removes all lines starting with “version” from the data flow.

  - name: omit_header_data
    type: regex_filter
    pattern: ^version
    negate: true

The pattern parameter defines a regular expression used to match log lines. The ^ symbol in the regex pattern is an anchor indicating the start of the line, so ^version matches any line that begins with the word “version”. The negate parameter is a boolean that, when set to true, inverts the filtering logic of the node. By default, the Regex Filter node would pass only those log entries that match the specified regex pattern. With negate: true, the node does the opposite: it blocks logs that match the pattern and passes those that do not match.

3. Field Extraction

Once headers are omitted, logs move to the grok_extract_fields node, which is a Grok node. This node uses an Edge Delta supplied pattern to extract fields in the body such as vpc_version, aws_account_id, network_source_ip, etc. and structure them as attributes. By transforming unstructured log data into structured data, this node makes these extracted fields easier to search, analyze, and visualize.

  - name: grok_extract_fields
    type: grok
    pattern:
      '%{INT:vpc_version} %{NOTSPACE:aws_account_id} (?:%{NOTSPACE:network_interface}|-)
      (?:%{NOTSPACE:network_source_ip}|-) (?:%{NOTSPACE:network_destination_ip}|-)
      (?:%{INT:network_client_port}|-) (?:%{INT:network_destination_port}|-) (?:%{NOTSPACE:network_protocol}|-)
      (?:%{INT:network_packet}|-) (?:%{INT:network_bytes_written}|-) %{INT:vpc_interval_start}
      %{INT:vpc_interval_end} (?:%{WORD:vpc_action}|-) %{WORD:vpc_status}.*'

4. Status-Based Routing

The logs then flow to the skip_nodata node, a Route node. This node routes logs based on the vpc_status attribute: logs where vpc_status is not “NODATA” are routed to the next processing phase. While “NODATA” logs are sent on the default unmatched path to the other_logs output.

  - name: skip_nodata
    type: route
    paths:
      - path: all_non_nodata
        condition: item["attributes"]["vpc_status"] != "NODATA"
        exit_if_matched: true

The condition parameter uses the Common Expression Language (CEL) to evaluate log entries. It checks if item["attributes"]["vpc_status"] is not equal to “NODATA”. The exit_if_matched parameter is set to true, meaning that if this condition is met, the log entry is immediately routed through the specified path (all_non_nodata), and no further conditions are evaluated.

5. Timestamp Transformation

The log_transform_timestamp node, a Log Transform node, updates the log entries by converting the vpc_interval_end attribute to a Unix Milliseconds format and inserting it into the item["timestamp"] field using the Edge Delta convert_timestamp macro. This ensures that the original log timestamp is used for the log, rather than the timestamp generated by Edge Delta when the agent ingested the log.

  - name: log_transform_timestamp
    type: log_transform
    transformations:
      - field_path: item["timestamp"]
        operation: upsert
        value:
          convert_timestamp(item["attributes"]["vpc_interval_end"], "Unix Second",
          "Unix Milli")

6. Action-Based Routing

Logs are subsequently processed by the action_router node, another Route node. This node routes logs based on the vpc_action attribute:

  - name: action_router
    type: route
    paths:
      - path: rejected
        condition: item["attributes"]["vpc_action"] == "REJECT"
        exit_if_matched: true
      - path: accepted
        condition: item["attributes"]["vpc_action"] == "ACCEPT"
        exit_if_matched: true

Logs with vpc_action set to “REJECT” are routed to the rejected_l2m and rejected_logs nodes.
Logs with vpc_action set to “ACCEPT” are routed to the accepted_l2m and accepted_logs nodes.
Logs that do not match either condition are routed to the other_logs node.

Similar to the skip_nodata node, the action_router node uses the condition parameter to route logs based on CEL expressions. The paths parameter defines multiple routes, while exit_if_matched: true ensures that once a condition is met, no further conditions are evaluated for that log entry.

This classification allows you to differentiate between accepted and rejected requests, which helps in isolating and investigating network security events. By routing logs based on specific actions, you can streamline your monitoring processes and focus on logs that are most relevant to network security, ensuring quicker identification and resolution of issues.

7. Log to Metric Conversion for Accepted Logs

Accepted logs pass through the accepted_l2m node, which is a Log to Metric node. This node tracks and reports metrics such as the sum and count of network bytes written, network packets, and the duration of the VPC interval (end - start).

  - name: accepted_l2m
    type: log_to_metric
    pattern: .*
    interval: 1m0s
    skip_empty_intervals: false
    only_report_nonzeros: false
    metric_name: vpc_accepted
    dimension_groups:
      - field_dimensions:
          - item["attributes"]["network_source_ip"]
          - item["attributes"]["network_destination_ip"]
          - string(int(item["attributes"]["network_client_port"]))
          - string(int(item["attributes"]["network_destination_port"]))
          - item["attributes"]["network_interface"]
        field_numeric_dimension: item["attributes"]["network_bytes_written"]
        enabled_stats:
          - sum
          - count
      - field_dimensions:
          - item["attributes"]["network_source_ip"]
          - item["attributes"]["network_destination_ip"]
          - string(int(item["attributes"]["network_client_port"]))
          - string(int(item["attributes"]["network_destination_port"]))
          - item["attributes"]["network_interface"]
        field_numeric_dimension: item["attributes"]["network_packet"]
        enabled_stats:
          - sum
          - count
      - field_dimensions:
          - item["attributes"]["network_source_ip"]
          - item["attributes"]["network_destination_ip"]
          - string(int(item["attributes"]["network_client_port"]))
          - string(int(item["attributes"]["network_destination_port"]))
          - item["attributes"]["network_interface"]
        field_numeric_dimension: item["attributes"]["vpc_interval_end"] - item["attributes"]["vpc_interval_start"]
        enabled_stats:
          - min
          - max
          - p95
          - p99
          - count

pattern: The regex pattern (.*) used to match log items in the body field. This node uses a catch-all pattern to process all incoming logs.
interval: Specifies the reporting interval for metrics, set to 1 minute. The node collects values for each interval before calculating and reporting metrics.
skip_empty_intervals: When set to false, it ensures metrics are reported even if no matching logs are found in an interval.
only_report_nonzeros: When set to false, it reports metrics even if the values are zero.
metric_name: Specifies the custom name for the resulting metric (vpc_accepted).
dimension_groups:
- field_dimensions: Defines dimensions to group the metrics by. These dimensions include network-specific fields such as network_source_ip, network_destination_ip, network_client_port, network_destination_port, and network_interface. Attributes network_client_port and network_destination_port are explicitly cast to integers before being converted to strings in the accepted_l2m node.
- field_numeric_dimension: Specifies a numeric field within the payload that contributes to the metric value, e.g., network_bytes_written or network_packet.
- enabled_stats: A list of statistics to be calculated and reported for the specified field_numeric_dimension.

8. Log to Metric Conversion for Rejected Logs

Similarly, rejected logs are processed by the rejected_l2m node, also a Log to Metric node. Metrics such as the sum and count of network bytes written, network packets, and the duration of the VPC interval (end - start) are tracked and reported. This facilitates monitoring and alerting on rejected network interactions, which can be indicative of security threats or misconfigurations.

  - name: rejected_l2m
    type: log_to_metric
    pattern: .*
    interval: 1m0s
    skip_empty_intervals: false
    only_report_nonzeros: false
    metric_name: vpc_rejected
    dimension_groups:
      - field_dimensions:
          - item["attributes"]["network_source_ip"]
          - item["attributes"]["network_destination_ip"]
          - string(int(item["attributes"]["network_client_port"]))
          - string(int(item["attributes"]["network_destination_port"]))
          - item["attributes"]["network_interface"]
        field_numeric_dimension: item["attributes"]["network_bytes_written"]
        enabled_stats:
          - sum
          - count
      - field_dimensions:
          - item["attributes"]["network_source_ip"]
          - item["attributes"]["network_destination_ip"]
          - string(int(item["attributes"]["network_client_port"]))
          - string(int(item["attributes"]["network_destination_port"]))
          - item["attributes"]["network_interface"]
        field_numeric_dimension: item["attributes"]["network_packet"]
        enabled_stats:
          - sum
          - count
      - field_dimensions:
          - item["attributes"]["network_source_ip"]
          - item["attributes"]["network_destination_ip"]
          - string(int(item["attributes"]["network_client_port"]))
          - string(int(item["attributes"]["network_destination_port"]))
          - item["attributes"]["network_interface"]
        field_numeric_dimension: item["attributes"]["vpc_interval_end"] - item["attributes"]["vpc_interval_start"]
        enabled_stats:
          - min
          - max
          - p95
          - p99
          - count

pattern: The regex pattern (.*) used to match log items in the body field. This node uses a catch-all pattern to process all incoming logs.
interval: Specifies the reporting interval for metrics, set to 1 minute. The node collects values for each interval before calculating and reporting metrics.
skip_empty_intervals: When set to false, it ensures metrics are reported even if no matching logs are found in an interval.
only_report_nonzeros: When set to false, it reports metrics even if the values are zero.
metric_name: Specifies the custom name for the resulting metric (vpc_rejected).
dimension_groups:
- field_dimensions: Defines dimensions to group the metrics by. These dimensions include network-specific fields such as network_source_ip, network_destination_ip, network_client_port, network_destination_port, and network_interface. Attributes network_client_port and network_destination_port are explicitly cast to integers before being converted to strings in the rejected_l2m node.
- field_numeric_dimension: Specifies a numeric field within the payload that contributes to the metric value, e.g., network_bytes_written or network_packet.
- enabled_stats: A list of statistics to be calculated and reported for the specified field_numeric_dimension.

9. Output of Accepted Logs

Accepted logs are captured by the accepted_logs node, a compound_output node. Retaining these logs ensures that accepted network interactions are well-documented and can be analyzed for usage patterns or troubleshooting .

10. Output of Rejected Logs

Rejected logs are directed to the rejected_logs node, another compound_output node. Isolating these logs helps you to quickly identify and investigate potential security issues or policy violations .

11. Output of All Other Logs

Logs that do not meet the specific conditions for acceptance or rejection are routed to the other_logs node, another compound_output node. This ensures a comprehensive log collection, retaining logs that may require further custom analysis .

12. Output of Accepted Metrics

Metrics from accepted logs are outputted through the accepted_metrics_output node. This aggregation enables you to monitor the performance and usage patterns of accepted network interactions over time .

13. Output of Rejected Metrics

Metrics from rejected logs are captured by the rejected_metrics_output node. This allows ongoing monitoring of rejection patterns, which can help identify and address underlying issues or security concerns.

Release Notes

See Get Updates for details about how to upgrade your deployed packs.

Version 1.1 November, 21 2024

Introduced explicit casting for network attributes network_client_port and network_destination_port to integers before converting them to strings in both the accepted_l2m and rejected_l2m nodes. This change addresses occasional CEL evaluation errors that arose due to type mismatches, ensuring smooth and accurate computation of metrics.

Version 1.0 - November, 7 2024

Initial release.

Sample Input

2 072962679531 eni-34220ccf76615c43c 49d7:6649:f2f:e3fb:f8c7:ba73:14d1:19b0 22f:4717:bcd1:cc0b:cc62:fba2:a5dc:1ab0 57320 19599 27 7904 62981471 1726760497 1726788476 ACCEPT SKIPDATA

2 280344740806 eni-ebbe59f2434f879ab 49.73.80.248 169.239.122.38 32952 50456 76 92503 85054070 1726760496 1726799451 REJECT NODATA