Edge Delta Log to Pattern Node

Identify log patterns using a clustering algorithm.

Overview

The Log to Pattern Node finds patterns in logs, and then groups (or clusters) these patterns based on similarities. It takes the body field of a log item (by default), runs a clustering algorithm, and creates cluster patterns and samples based on the node definition. You can have multiple clustering definitions.

See Create Patterns from Logs for more information.

Example Configuration

nodes:
- name: log_to_pattern_test
  type: log_to_pattern
  num_of_clusters: 15
  samples_per_cluster: 1
  reporting_frequency: 30s
  throttle_limit_per_sec: 200

Input Logs

2024-09-18T08:16:44Z WARN k8s.pod.name=api-deployment-d79fab72249c k8s.namespace.name=inventory nodeID=node1 "User deleted account but has active subscription"

Pattern Output:

{
  "_type": "cluster_pattern_and_sample",
  "resource": {
    ...
  },
  "start_timestamp": 1726647963430,
  "timestamp": 1726647963480,
  "_pattern": "* WARN * k*s namespace name=inventory nodeID=node* User deleted account but has active subscription",
  "_pattern_count": 1,
  "_sample": "2024-09-18T08:16:44Z WARN k8s.pod.name=api-deployment-d79fab72249c k8s.namespace.name=inventory nodeID=node1 \"User deleted account but has active subscription\"",
  "_sentiment_score": 0
}

The pattern can be found in the Patterns explorer. It has been classified as neutral:

Required Parameters

name

A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a - and a space followed by the string. It is a required parameter for all nodes.

nodes:
  - name: <node name>
    type: <node type>

type: log_to_pattern

The type parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.

nodes:
  - name: <node name>
    type: <node type>

Optional Parameters

drain_tree_depth

The drain_tree_depth parameter determines the depth of drain tree for pattern identification. The drain tree organizes logs into a structured hierarchy. The depth determines how many levels the tree can have before logs are grouped into a pattern. A deeper tree means finer-grained classification, while a shallower tree leads to broader grouping. Therefore, increasing the value will create more granular patterns (higher specificity) and help differentiate logs with subtle differences, but it will consume more memory due to a larger tree structure. It should be set between 4 and 15. It is specified as an integer with a default of 7 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    drain_tree_depth: 8

See Create Patterns from Logs for more information.

drain_tree_max_child

The drain_tree_max_child parameter controls the maximum number of child nodes each node in the drain tree can have. The number of child nodes per level defines how logs are distributed before they are merged into a pattern. A higher value allows more detailed branching, while a lower value forces early merging of patterns. Increasing the value will yield more similar patterns at the expense of memory. It should be between 50 and 200. It is specified as an integer, the default is 100 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    drain_tree_max_child: 110

See Create Patterns from Logs for more information.

field_path

The field_path parameter defines the location of the value for clustering. Is is specified with a bracket notation string. If not specified, the body field is used by default.

nodes:
  - name: <node name>
    type: log_to_pattern
    field_path: item["attributes"]["payload"]

group_by

The group_by parameter is used to define a list of expressions (CEL or Go) that will be used for aggregating clustering items in buckets. It is specified as a list of strings and is optional. If it is not set, items are grouped by their source.

nodes:
  - name: <node name>
    type: log_to_pattern
    group_by:
    - item["resource"]["service.name"]

You can create a custom facet in the Pattern Explorer for this dimension.

num_of_clusters

The num_of_clusters parameter is used to define the maximum number of clusters kept at run-time per input. It is specified as an integer greater than zero, the default is 15 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    num_of_clusters: <integer greater than 0>

reporting_frequency

The reporting_frequency parameter is used to define the frequency at which the cluster pattern and cluster samples are posted to the destination nodes. It is specified as a duration with a default of 3 minutes and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    reporting_frequency: <duration>

Note: Bear in mind the relationship between the reporting frequency in the node and the x-axis interval in the Patterns Explorer, which is 1 minute. A reporting frequency of 3 minutes results in no data for two intervals in the explorer followed by an aggregation of the past three minutes. This may be suitable for most use cases but you can reduce the reporting frequency to less than 1 minute to reduce graph variation and identify exact moments of anomalous behavior.

retire_period

The retire_period parameter is used to specify an inactivity period for a pattern. If a pattern is not observed during that period it is retired. It is specified as a duration larger than 1 minute, the default is 10 minutes and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    retire_period: <duration>

samples_per_cluster

The samples_per_cluster parameter is used to define how many text messages will be kept in each cluster, with new messages replacing old ones. It is specified as an integer with a default value of 1 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    samples_per_cluster: <integer>

similarity_threshold

The similarity_threshold parameter defines how similar a new log entry must be to an existing pattern before it is grouped into that pattern. It helps in deciding whether a log should be merged with an existing pattern or form a new one. Increasing the value will yield more similar patterns at the expense of memory. It should be between 0.0 and 1.0. It is expressed as a double, the default is 0.5 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    similarity_threshold: 0.6

See Create Patterns from Logs for more information.

throttle_limit_per_sec

The throttle_limit_per_sec parameter is used to limit the number of logs being clustered per second and per source. It is specified as an integer with a default value of 200 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    throttle_limit_per_sec: <integer>