Edge Delta Log to Pattern Node

Identify log patterns using a clustering algorithm.

Overview

The Log to Pattern Node finds patterns in logs, and then groups (or clusters) these patterns based on similarities. It takes the body field of a log item, runs a clustering algorithm, and creates cluster patterns and samples based on the node definition. You can have multiple clustering definitions.

Example Configuration

nodes:
- name: log_to_pattern_test
  type: log_to_pattern
  num_of_clusters: 15
  samples_per_cluster: 1
  reporting_frequency: 30s
  throttle_limit_per_sec: 200

Input Logs

2024-09-18T08:16:44Z WARN k8s.pod.name=api-deployment-d79fab72249c k8s.namespace.name=inventory nodeID=node1 "User deleted account but has active subscription"

Pattern Output:

{
  "_type": "cluster_pattern_and_sample",
  "resource": {
    ...
  },
  "start_timestamp": 1726647963430,
  "timestamp": 1726647963480,
  "_pattern": "* WARN * k*s namespace name=inventory nodeID=node* User deleted account but has active subscription",
  "_pattern_count": 1,
  "_sample": "2024-09-18T08:16:44Z WARN k8s.pod.name=api-deployment-d79fab72249c k8s.namespace.name=inventory nodeID=node1 \"User deleted account but has active subscription\"",
  "_sentiment_score": 0
}

The pattern can be found in the Patterns explorer. It has been classified as neutral:

Required Parameters

name

A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a - and a space followed by the string. It is a required parameter for all nodes.

nodes:
  - name: <node name>
    type: <node type>

type: log_to_pattern

The type parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.

nodes:
  - name: <node name>
    type: <node type>

Optional Parameters

group_by

The group_by parameter is used to define a list of expressions (CEL or Go) that will be used for aggregating clustering items in buckets. It is specified as a list of strings and is optional. If it is not set, items are grouped by their source.

nodes:
  - name: <node name>
    type: log_to_pattern
    group_by:
    - item["resource"]["service.name"]

You can create a custom facet in the Pattern Explorer for this dimension.

num_of_clusters

The num_of_clusters parameter is used to define the maximum number of clusters kept at run-time per input. It is specified as an integer greater than zero, the default is 15 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    num_of_clusters: <integer greater than 0>

reporting_frequency

The reporting_frequency parameter is used to define the frequency at which the cluster pattern and cluster samples are posted to the destination nodes. It is specified as a duration with a default of 3 minutes and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    reporting_frequency: <duration>

Note: Bear in mind the relationship between the reporting frequency in the node and the x-axis interval in the Patterns Explorer, which is 1 minute. A reporting frequency of 3 minutes results in no data for two intervals in the explorer followed by an aggregation of the past three minutes. This may be suitable for most use cases but you can reduce the reporting frequency to less than 1 minute to reduce graph variation and identify exact moments of anomalous behavior.

retire_period

The retire_period parameter is used to specify an inactivity period for a pattern. If a pattern is not observed during that period it is retired. It is specified as a duration larger than 1 minute, the default is 10 minutes and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    retire_period: <duration>

samples_per_cluster

The samples_per_cluster parameter is used to define how many text messages will be kept in each cluster, with new messages replacing old ones. It is specified as an integer with a default value of 1 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    samples_per_cluster: <integer>

throttle_limit_per_sec

The throttle_limit_per_sec parameter is used to limit the number of logs being clustered per second and per source. It is specified as an integer with a default value of 200 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    throttle_limit_per_sec: <integer>