AWS VPC JSON Pack
8 minute read
Edge Delta Pipeline Pack for AWS VPC Flow
Overview
The AWS VPC Flow pipeline ensures effective processing of VPC Flow log data, providing insights into network activity and security. It filters, structures, and classifies logs based on actions and status.
Pack Description
1. Data Ingestion
The data flow starts with the compound_input node, which is a compound_input node. This node serves as the entry point into the pipeline, where it begins processing the incoming AWS VPC Flow logs.
2. Omitted Header Data Filtering
Next, logs flow into the omit_header_data node, which is a Regex Filter node. It removes all lines starting with “version” from the data flow.
- name: omit_header_data
type: regex_filter
pattern: ^version
negate: true
The pattern parameter defines a regular expression used to match log lines. The ^ symbol in the regex pattern is an anchor indicating the start of the line, so ^version matches any line that begins with the word “version”. The negate parameter is a boolean that, when set to true, inverts the filtering logic of the node. By default, the Regex Filter node would pass only those log entries that match the specified regex pattern. With negate: true, the node does the opposite: it blocks logs that match the pattern and passes those that do not match.
3. Field Extraction
Once headers are omitted, logs move to the grok_extract_fields node, which is a Grok node. This node uses an Edge Delta supplied pattern to extract fields in the body such as vpc_version, aws_account_id, network_source_ip, etc. and structure them as attributes. By transforming unstructured log data into structured data, this node makes these extracted fields easier to search, analyze, and visualize.
- name: grok_extract_fields
type: grok
pattern:
'%{INT:vpc_version} %{NOTSPACE:aws_account_id} (?:%{NOTSPACE:network_interface}|-)
(?:%{NOTSPACE:network_source_ip}|-) (?:%{NOTSPACE:network_destination_ip}|-)
(?:%{INT:network_client_port}|-) (?:%{INT:network_destination_port}|-) (?:%{NOTSPACE:network_protocol}|-)
(?:%{INT:network_packet}|-) (?:%{INT:network_bytes_written}|-) %{INT:vpc_interval_start}
%{INT:vpc_interval_end} (?:%{WORD:vpc_action}|-) %{WORD:vpc_status}.*'
4. Status-Based Routing
The logs then flow to the skip_nodata node, a Route node. This node routes logs based on the vpc_status attribute: logs where vpc_status is not “NODATA” are routed to the next processing phase. While “NODATA” logs are sent on the default unmatched path to the other_logs output.
- name: skip_nodata
type: route
paths:
- path: all_non_nodata
condition: item["attributes"]["vpc_status"] != "NODATA"
exit_if_matched: true
The condition parameter uses the Common Expression Language (CEL) to evaluate log entries. It checks if item["attributes"]["vpc_status"] is not equal to “NODATA”. The exit_if_matched parameter is set to true, meaning that if this condition is met, the log entry is immediately routed through the specified path (all_non_nodata), and no further conditions are evaluated.
5. Timestamp Transformation
The log_transform_timestamp node, a Log Transform node, updates the log entries by converting the vpc_interval_end attribute to a Unix Milliseconds format and inserting it into the item["timestamp"] field using the Edge Delta convert_timestamp macro. This ensures that the original log timestamp is used for the log, rather than the timestamp generated by Edge Delta when the agent ingested the log.
- name: log_transform_timestamp
type: log_transform
transformations:
- field_path: item["timestamp"]
operation: upsert
value:
convert_timestamp(item["attributes"]["vpc_interval_end"], "Unix Second",
"Unix Milli")
6. Action-Based Routing
Logs are subsequently processed by the action_router node, another Route node. This node routes logs based on the vpc_action attribute:
- name: action_router
type: route
paths:
- path: rejected
condition: item["attributes"]["vpc_action"] == "REJECT"
exit_if_matched: true
- path: accepted
condition: item["attributes"]["vpc_action"] == "ACCEPT"
exit_if_matched: true
- Logs with
vpc_actionset to “REJECT” are routed to therejected_l2mand rejected_logs nodes. - Logs with
vpc_actionset to “ACCEPT” are routed to theaccepted_l2mand accepted_logs nodes. - Logs that do not match either condition are routed to the
other_logsnode.
Similar to the skip_nodata node, the action_router node uses the condition parameter to route logs based on CEL expressions. The paths parameter defines multiple routes, while exit_if_matched: true ensures that once a condition is met, no further conditions are evaluated for that log entry.
This classification allows you to differentiate between accepted and rejected requests, which helps in isolating and investigating network security events. By routing logs based on specific actions, you can streamline your monitoring processes and focus on logs that are most relevant to network security, ensuring quicker identification and resolution of issues.
7. Log to Metric Conversion for Accepted Logs
Accepted logs pass through the accepted_l2m node, which is a Log to Metric node. This node tracks and reports metrics such as the sum and count of network bytes written, network packets, and the duration of the VPC interval (end - start).
- name: accepted_l2m
type: log_to_metric
pattern: .*
interval: 1m0s
skip_empty_intervals: false
only_report_nonzeros: false
metric_name: vpc_accepted
dimension_groups:
- field_dimensions:
- item["attributes"]["network_source_ip"]
- item["attributes"]["network_destination_ip"]
- string(int(item["attributes"]["network_client_port"]))
- string(int(item["attributes"]["network_destination_port"]))
- item["attributes"]["network_interface"]
field_numeric_dimension: item["attributes"]["network_bytes_written"]
enabled_stats:
- sum
- count
- field_dimensions:
- item["attributes"]["network_source_ip"]
- item["attributes"]["network_destination_ip"]
- string(int(item["attributes"]["network_client_port"]))
- string(int(item["attributes"]["network_destination_port"]))
- item["attributes"]["network_interface"]
field_numeric_dimension: item["attributes"]["network_packet"]
enabled_stats:
- sum
- count
- field_dimensions:
- item["attributes"]["network_source_ip"]
- item["attributes"]["network_destination_ip"]
- string(int(item["attributes"]["network_client_port"]))
- string(int(item["attributes"]["network_destination_port"]))
- item["attributes"]["network_interface"]
field_numeric_dimension: item["attributes"]["vpc_interval_end"] - item["attributes"]["vpc_interval_start"]
enabled_stats:
- min
- max
- p95
- p99
- count
pattern: The regex pattern (.*) used to match log items in the body field. This node uses a catch-all pattern to process all incoming logs.interval: Specifies the reporting interval for metrics, set to 1 minute. The node collects values for each interval before calculating and reporting metrics.skip_empty_intervals: When set tofalse, it ensures metrics are reported even if no matching logs are found in an interval.only_report_nonzeros: When set tofalse, it reports metrics even if the values are zero.metric_name: Specifies the custom name for the resulting metric (vpc_accepted).dimension_groups:field_dimensions: Defines dimensions to group the metrics by. These dimensions include network-specific fields such asnetwork_source_ip,network_destination_ip,network_client_port,network_destination_port, andnetwork_interface. Attributesnetwork_client_portandnetwork_destination_portare explicitly cast to integers before being converted to strings in theaccepted_l2mnode.field_numeric_dimension: Specifies a numeric field within the payload that contributes to the metric value, e.g.,network_bytes_writtenornetwork_packet.enabled_stats: A list of statistics to be calculated and reported for the specifiedfield_numeric_dimension.
8. Log to Metric Conversion for Rejected Logs
Similarly, rejected logs are processed by the rejected_l2m node, also a Log to Metric node. Metrics such as the sum and count of network bytes written, network packets, and the duration of the VPC interval (end - start) are tracked and reported. This facilitates monitoring and alerting on rejected network interactions, which can be indicative of security threats or misconfigurations.
- name: rejected_l2m
type: log_to_metric
pattern: .*
interval: 1m0s
skip_empty_intervals: false
only_report_nonzeros: false
metric_name: vpc_rejected
dimension_groups:
- field_dimensions:
- item["attributes"]["network_source_ip"]
- item["attributes"]["network_destination_ip"]
- string(int(item["attributes"]["network_client_port"]))
- string(int(item["attributes"]["network_destination_port"]))
- item["attributes"]["network_interface"]
field_numeric_dimension: item["attributes"]["network_bytes_written"]
enabled_stats:
- sum
- count
- field_dimensions:
- item["attributes"]["network_source_ip"]
- item["attributes"]["network_destination_ip"]
- string(int(item["attributes"]["network_client_port"]))
- string(int(item["attributes"]["network_destination_port"]))
- item["attributes"]["network_interface"]
field_numeric_dimension: item["attributes"]["network_packet"]
enabled_stats:
- sum
- count
- field_dimensions:
- item["attributes"]["network_source_ip"]
- item["attributes"]["network_destination_ip"]
- string(int(item["attributes"]["network_client_port"]))
- string(int(item["attributes"]["network_destination_port"]))
- item["attributes"]["network_interface"]
field_numeric_dimension: item["attributes"]["vpc_interval_end"] - item["attributes"]["vpc_interval_start"]
enabled_stats:
- min
- max
- p95
- p99
- count
pattern: The regex pattern (.*) used to match log items in the body field. This node uses a catch-all pattern to process all incoming logs.interval: Specifies the reporting interval for metrics, set to 1 minute. The node collects values for each interval before calculating and reporting metrics.skip_empty_intervals: When set tofalse, it ensures metrics are reported even if no matching logs are found in an interval.only_report_nonzeros: When set tofalse, it reports metrics even if the values are zero.metric_name: Specifies the custom name for the resulting metric (vpc_rejected).dimension_groups:field_dimensions: Defines dimensions to group the metrics by. These dimensions include network-specific fields such asnetwork_source_ip,network_destination_ip,network_client_port,network_destination_port, andnetwork_interface. Attributesnetwork_client_portandnetwork_destination_portare explicitly cast to integers before being converted to strings in therejected_l2mnode.field_numeric_dimension: Specifies a numeric field within the payload that contributes to the metric value, e.g.,network_bytes_writtenornetwork_packet.enabled_stats: A list of statistics to be calculated and reported for the specifiedfield_numeric_dimension.
9. Output of Accepted Logs
Accepted logs are captured by the accepted_logs node, a compound_output node. Retaining these logs ensures that accepted network interactions are well-documented and can be analyzed for usage patterns or troubleshooting .
10. Output of Rejected Logs
Rejected logs are directed to the rejected_logs node, another compound_output node. Isolating these logs helps you to quickly identify and investigate potential security issues or policy violations .
11. Output of All Other Logs
Logs that do not meet the specific conditions for acceptance or rejection are routed to the other_logs node, another compound_output node. This ensures a comprehensive log collection, retaining logs that may require further custom analysis .
12. Output of Accepted Metrics
Metrics from accepted logs are outputted through the accepted_metrics_output node. This aggregation enables you to monitor the performance and usage patterns of accepted network interactions over time .
13. Output of Rejected Metrics
Metrics from rejected logs are captured by the rejected_metrics_output node. This allows ongoing monitoring of rejection patterns, which can help identify and address underlying issues or security concerns.
Release Notes
See Get Updates for details about how to upgrade your deployed packs.
Version 1.1 November, 21 2024
- Introduced explicit casting for network attributes
network_client_portandnetwork_destination_portto integers before converting them to strings in both theaccepted_l2mandrejected_l2mnodes. This change addresses occasional CEL evaluation errors that arose due to type mismatches, ensuring smooth and accurate computation of metrics.
Version 1.0 - November, 7 2024
- Initial release.
Sample Input
2 072962679531 eni-34220ccf76615c43c 49d7:6649:f2f:e3fb:f8c7:ba73:14d1:19b0 22f:4717:bcd1:cc0b:cc62:fba2:a5dc:1ab0 57320 19599 27 7904 62981471 1726760497 1726788476 ACCEPT SKIPDATA
2 280344740806 eni-ebbe59f2434f879ab 49.73.80.248 169.239.122.38 32952 50456 76 92503 85054070 1726760496 1726799451 REJECT NODATA