Edge Delta Log to Pattern Node

Identify log patterns using a clustering algorithm.

Overview

The Log to Pattern Node finds patterns in logs, and then groups (or clusters) these patterns based on similarities. It takes the body field of a log item, runs a clustering algorithm based on ragel + drain tree, and creates cluster patterns and samples based on the node definition. You can have multiple clustering definitions. See more information on Clustered Invariants.

Example Configuration

nodes:
- name: log_to_pattern_test
  type: log_to_pattern
  num_of_clusters: 100
  samples_per_cluster: 20
  reporting_frequency: 30s
  throttle_limit_per_sec: 200

Input Logs

2023-04-01T12:00:00Z INFO k8s.pod.name=api-deployment-d79fab72249c k8s.namespace.name=edgedelta nodeID=node1 "User login successful"
2023-04-01T12:01:00Z ERROR k8s.pod.name=api-deployment-d79fab72249c k8s.namespace.name=edgedelta nodeID=node1 "Database connection failed"
2023-04-01T12:02:00Z WARN k8s.pod.name=api-deployment-d79fab72249c k8s.namespace.name=edgedelta nodeID=node1 "Memory usage high on container"

Resulting Pattern

On the Logs - Patterns page the following negative pattern is on the graph:

It is one of two patterns discovered in the thee logs.

The patterns table lists the two patterns discovered and indicates the sentiment.

Required Parameters

name

A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the yaml using the name. It must be unique across all nodes. It is a yaml list element so it begins with a - and a space followed by the string. It is a required parameter for all nodes.

nodes:
  - name: <node name>
    type: <node type>

type: log_to_pattern

The type parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.

nodes:
  - name: <node name>
    type: <node type>

Optional Parameters

num_of_clusters

The num_of_clusters parameter is used to define the maximum number of clusters kept at run-time per input. It is specified as an integer greater than zero, the default is 15 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    num_of_clusters: <integer greater than 0>

reporting_frequency

The reporting_frequency parameter is used to define the frequency at which the cluster pattern and cluster samples are posted to the output nodes. It is specified as a duration with a default of 3 minutes and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    reporting_frequency: <duration>

retire_period

The retire_period parameter is used to specify an inactivity period for a pattern. If a pattern is not observed during that period it is retired. It is specified as a duration larger than 1 minute, the default is 10 minutes and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    retire_period: <duration>

samples_per_cluster

The samples_per_cluster parameter is used to define how many text messages will be kept in each cluster, with new messages replacing old ones. It is specified as an integer with a default value of 1 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    samples_per_cluster: <integer>

throttle_limit_per_sec

The throttle_limit_per_sec parameter is used to limit the number of logs being clustered per second and per source. It is specified as an integer with a default value of 200 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    throttle_limit_per_sec: <integer>