Edge Delta Deduplicate Node
3 minute read
Overview
The Dedup node is used to deduplicate identical logs discovered within a specified interval. Identical logs are those with the same body, attributes, and resources. This node is useful in scenarios where you want to reduce data volume.
Duplicate logs may come about as a result of parallel pipeline design or from systems upstream of the agent.
If duplicates are found, one instance will pass with a log_count
attribute for the number of replicas dropped. The count attribute counts the number of dropped logs - two identical logs will result in a single log with a log_count
attribute of 1
. The timestamp
is ignored so identical logs with different timestamps but within the same interval will be rolled up into a single log.
Example Configuration
In this example, identical logs within a 2 minute interval will be dropped. An attribute, http_status_code
, is ignored. This means variation in this field will not prevent rolling up. In addition, the log_count
field is renamed to dedup_log_count
.
- name: dedup_processor
type: dedup
interval: 2m
excluded_field_paths:
- item.attributes.http_status_code
count_field_path: item.attributes.dedup_log_count
Example Input
{
"id": "5a8fa619-54c6-4e99-98b4-e207827d7f7f",
"timestamp": 1732507791786,
"severity_text": "",
"body": "User login attempt",
"resource": {
...
},
"attributes": {
"user_id": "56789",
"http_status_code": "200"
}
}
{
"id": "c3ad2a78-af53-47e5-8407-c3956ff2c9a1",
"timestamp": 1732507791798,
"severity_text": "",
"body": "User login attempt",
"resource": {
...
},
"attributes": {
"user_id": "56789",
"http_status_code": "401"
}
}
Example Output
{
"id": "c3ad2a78-af53-47e5-8407-c3956ff2c9a1",
"timestamp": 1732507791798,
"severity_text": "",
"body": "User login attempt",
"resource": {
...
},
"attributes": {
"user_id": "56789",
"http_status_code": "200",
"dedup_log_count": 1
}
}
A single log instance is passed and enriched with a dedup_log_count
attribute indicating the number of log instances that were rolled up into the single item.
Required Parameters
name
A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a -
and a space followed by the string. It is a required parameter for all nodes.
nodes:
- name: <node name>
type: <node type>
type: dedup
The type
parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.
nodes:
- name: <node name>
type: <node type>
Optional Parameters
count_field_path
The count_field_path
parameter specifies the name of the attribute field that will contain the integer for the number of logs that were rolled up. You specify it as a string. It is optional and the default is log_count
.
- name: dedup_processor
type: dedup
count_field_path: item.attributes.dedup_log_count
excluded_field_paths
The excluded_field_paths
parameter specifies the fields that should not be evaluated for variation. This means that even if these fields are different, the log might be rolled up. You specify one or more items as a list and it is optional. By default, the timestamp is excluded while the body field cannot be excluded.
- name: dedup_processor
type: dedup
excluded_field_paths:
- item.attributes.http_status_code
interval
The interval
parameter defines the window in which the node evaluates logs for duplicates. If one identical log falls within each interval it will not be dropped and rolled up. It is specified as a duration, the default is 30s and it is optional.
- name: dedup_processor
type: dedup
interval: 2m