Edge Delta Log to Pattern Node

Identify log patterns using a clustering algorithm.

Overview

The Log to Pattern Node finds patterns in logs, and then groups (or clusters) these patterns based on similarities. It takes the body field of a log item, runs a clustering algorithm based on ragel + drain tree, and creates cluster patterns and samples based on the node definition. You can have multiple clustering definitions. See more information on Clustered Invariants.

Example Configuration

nodes:
  - name: k8s-clustering
    type: log_to_pattern
    num_of_clusters: 100
    samples_per_cluster: 20
    reporting_frequency: 30s
    retention: 10m
    throttle_limit_per_sec: 200

Required Parameters

name

A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the yaml using the name. It must be unique across all nodes. It is a yaml list element so it begins with a - and a space followed by the string. It is a required parameter for all nodes.

nodes:
  - name: <node name>
    type: <node type>

type: log_to_pattern

The type parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.

nodes:
  - name: <node name>
    type: <node type>

Optional Parameters

num_of_clusters

The num_of_clusters parameter is used to define the maximum number of clusters kept at run-time per input. It is specified as an integer greater than zero, the default is 15 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    num_of_clusters: <integer greater than 0>

reporting_frequency

The reporting_frequency parameter is used to define the frequency at which the cluster pattern and cluster samples are posted to the output nodes. It is specified as a duration with a default of 3 minutes and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    reporting_frequency: <duration>

retire_period

The retire_period parameter is used to specify an inactivity period for a pattern. If a pattern is not observed during that period it is retired. It is specified as a duration larger than 1 minute, the default is 10 minutes and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    retire_period: <duration>

samples_per_cluster

The samples_per_cluster parameter is used to define how many text messages will be kept in each cluster, with new messages replacing old ones. It is specified as an integer with a default value of 1 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    samples_per_cluster: <integer>

throttle_limit_per_sec

The throttle_limit_per_sec parameter is used to limit the number of logs being clustered per second and per source. It is specified as an integer with a default value of 200 and it is optional.

nodes:
  - name: <node name>
    type: log_to_pattern
    throttle_limit_per_sec: <integer>