Edge Delta Deduplicate Logs Processor
4 minute read
Overview
The deduplicate logs processor is used to deduplicate identical logs discovered within a specified interval. Identical logs are those with the same body, attributes, and resources. This node is useful in scenarios where you want to reduce data volume.
Duplicate logs may come about as a result of parallel pipeline design or from systems upstream of the agent.
If duplicates are found, one instance will pass with a log_count
attribute for the number of replicas dropped. The count attribute counts the number of dropped logs - two identical logs will result in a single log with a log_count
attribute of 1
. The timestamp
is ignored so identical logs with different timestamps but within the same interval will be rolled up into a single log.
For detailed instructions on how to use multiprocessors, see Use Multiprocessors.
Configuration
In this example, all the logs in the input pane are duplicates. So they are compacted into a single log with an additional count field attributes["log_count"]
showing the number of duplicates detected during the interval (100 in the case of live capture).

This configuration generates the following YAML:
- name: Multi Processor_db04
type: sequence
processors:
- type: dedup
metadata: '{"id":"AZERA9gNqbzo6gJsKeQs9","type":"dedup","name":"Deduplicate Logs"}'
data_types:
- log
interval: 1m0s
count_field_path: attributes["log_count"]
Options
condition
The condition
parameter contains a conditional phrase of an OTTL statement. It restricts operation of the processor to only data items where the condition is met. Those data items that do not match the condition are passed without processing. You configure it in the interface and an OTTL condition is generated. It is optional. You can select one of the following operators:
Operator | Name | Description | Example |
---|---|---|---|
== |
Equal to | Returns true if both values are exactly the same |
attributes["status"] == "OK" |
!= |
Not equal to | Returns true if the values are not the same |
attributes["level"] != "debug" |
> |
Greater than | Returns true if the left value is greater than the right |
attributes["duration_ms"] > 1000 |
>= |
Greater than or equal | Returns true if the left value is greater than or equal to the right |
attributes["score"] >= 90 |
< |
Less than | Returns true if the left value is less than the right |
attributes["load"] < 0.75 |
<= |
Less than or equal | Returns true if the left value is less than or equal to the right |
attributes["retries"] <= 3 |
matches |
Regex match | Returns true if the string matches a regular expression |
isMatch(attributes["name"], ".*\\.name$" |
It is defined in YAML as follows:
- name: _multiprocessor
type: sequence
processors:
- type: <processor type>
condition: attributes["request"]["path"] == "/json/view"
interval
The interval
parameter defines the window in which the node evaluates logs for duplicates. If one identical log falls within each interval it will not be dropped and rolled up. It is specified as a duration, the default is 30s and it is optional.
It is defined in YAML as follows:
- type: dedup
metadata: '{"id":"boLCEZZhdslDr2G6TFSp2","type":"dedup","name":"Deduplicate Logs"}'
interval: 1m0s
Count Field
The count_field_path
parameter specifies the name of the attribute field that will contain the integer for the number of logs that were rolled up. You specify it in the tool and it is defined for you in YAML as a string. It is optional and the default is log_count
.
It is defined in YAML as follows:
- type: dedup
metadata: '{"id":"boLCEZZhdslDr2G6TFSp2","type":"dedup","name":"Deduplicate Logs"}'
count_field_path: <path to field>
Excluded Fields
The excluded_field_paths
parameter specifies the fields that should not be evaluated for variation. This means that even if these fields are different, the log might be rolled up. You specify one or more items in the tool, which generates a list in the YAML and it is optional. By default, the timestamp is excluded while the body field cannot be excluded.
It is defined in YAML as follows:
- type: dedup
metadata: '{"id":"boLCEZZhdslDr2G6TFSp2","type":"dedup","name":"Deduplicate Logs"}'
excluded_field_paths:
- attributes["level"]
Final
The final
parameter specifies whether successfully processed data items should continue to subsequent processors within the same multiprocessor node. Data items that fail to be processed by the processor will be passed to the next processor in the node regardless of this setting. You select the slider in the tool which specifies it for you in the YAML as a Boolean. The default is false
and it is optional.
It is defined in YAML as follows:
- name: multiprocessor
type: sequence
processors:
- type: <processor type>
final: true