Edge Delta Log to Pattern Node
4 minute read
Overview
The Log to Pattern Node finds patterns in logs, and then groups (or clusters) these patterns based on similarities. It takes the body
field of a log item, runs a clustering algorithm, and creates cluster patterns and samples based on the node definition. You can have multiple clustering definitions.
- incoming_data_types: log
- outgoing_data_types: cluster_pattern_and_sample
Example Configuration
nodes:
- name: log_to_pattern_test
type: log_to_pattern
num_of_clusters: 15
samples_per_cluster: 1
reporting_frequency: 30s
throttle_limit_per_sec: 200
Input Logs
2024-09-18T08:16:44Z WARN k8s.pod.name=api-deployment-d79fab72249c k8s.namespace.name=inventory nodeID=node1 "User deleted account but has active subscription"
Pattern Output:
{
"_type": "cluster_pattern_and_sample",
"resource": {
...
},
"start_timestamp": 1726647963430,
"timestamp": 1726647963480,
"_pattern": "* WARN * k*s namespace name=inventory nodeID=node* User deleted account but has active subscription",
"_pattern_count": 1,
"_sample": "2024-09-18T08:16:44Z WARN k8s.pod.name=api-deployment-d79fab72249c k8s.namespace.name=inventory nodeID=node1 \"User deleted account but has active subscription\"",
"_sentiment_score": 0
}
The pattern can be found in the Patterns explorer. It has been classified as neutral:
Required Parameters
name
A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a -
and a space followed by the string. It is a required parameter for all nodes.
nodes:
- name: <node name>
type: <node type>
type: log_to_pattern
The type
parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.
nodes:
- name: <node name>
type: <node type>
Optional Parameters
group_by
The group_by
parameter is used to define a list of expressions (CEL or Go) that will be used for aggregating clustering items in buckets. It is specified as a list of strings and is optional. If it is not set, items are grouped by their source.
nodes:
- name: <node name>
type: log_to_pattern
group_by:
- item["resource"]["service.name"]
You can create a custom facet in the Pattern Explorer for this dimension.
num_of_clusters
The num_of_clusters
parameter is used to define the maximum number of clusters kept at run-time per input. It is specified as an integer greater than zero, the default is 15 and it is optional.
nodes:
- name: <node name>
type: log_to_pattern
num_of_clusters: <integer greater than 0>
reporting_frequency
The reporting_frequency
parameter is used to define the frequency at which the cluster pattern and cluster samples are posted to the destination nodes. It is specified as a duration with a default of 3 minutes and it is optional.
nodes:
- name: <node name>
type: log_to_pattern
reporting_frequency: <duration>
Note: Bear in mind the relationship between the reporting frequency in the node and the x-axis interval in the Patterns Explorer, which is 1 minute. A reporting frequency of 3 minutes results in no data for two intervals in the explorer followed by an aggregation of the past three minutes. This may be suitable for most use cases but you can reduce the reporting frequency to less than 1 minute to reduce graph variation and identify exact moments of anomalous behavior.
retire_period
The retire_period
parameter is used to specify an inactivity period for a pattern. If a pattern is not observed during that period it is retired. It is specified as a duration larger than 1 minute, the default is 10 minutes and it is optional.
nodes:
- name: <node name>
type: log_to_pattern
retire_period: <duration>
samples_per_cluster
The samples_per_cluster
parameter is used to define how many text messages will be kept in each cluster, with new messages replacing old ones. It is specified as an integer with a default value of 1 and it is optional.
nodes:
- name: <node name>
type: log_to_pattern
samples_per_cluster: <integer>
throttle_limit_per_sec
The throttle_limit_per_sec
parameter is used to limit the number of logs being clustered per second and per source. It is specified as an integer with a default value of 200 and it is optional.
nodes:
- name: <node name>
type: log_to_pattern
throttle_limit_per_sec: <integer>