##### Cluster

Find patterns in logs.

Perform logs to metrics conversions

3 minute read

A Processor is based on regex monitoring logic that analyzes your log data as it’s created. Processors also extract monitoring KPIs from log data. Once configured, processors will populate the Anomalies and Metrics pages.

See the instructions for configuring an agent.

Edge Delta calculates similarities between invariants for clustering purposes.

When a new log passes through the pipeline:

- Variants are identified via a proprietary Ragel FSM-based tokenization process.
- The identified variants are stripped from the log and replaced with wildcards.
- The remaining invariant components are compared to existing pattern sets to calculate similarities. Invariant components are calculated for similarities so that the invariants can be transformed and clustered into structured log messages.

There are 2 ways to calculate similarities:

- Drain
- Levenshtein distance

Drain is the default log parsing algorithm used to cluster logs. This algorithm is based on a parse tree, with a fixed depth to guide the log group search process. This workflow helps to avoid a deep and unbalanced tree.

When a new raw log message arrives, Edge Delta processes the message with the Ragel FSM-based tokenization process. Then, Edge Delta searches for a log group through the nodes of the tree, based on the token prefix.

If a suitable log group is found, then Edge Delta will also calculates the similarities between the log message and the log event stored in the log group. If the similarity rate is above a certain threshold, then the log message will be matched with the log event stored in that log group.

If not, a new log group will be created based on the log message. To accelerate this process, Edge Delta designs a parse tree with a fixed depth, and nodes with fixed children to guide the log group search. This helps to limit the number of log groups that a raw log message needs to be compared to.

Since Edge Delta uses a drain log parse tree for clustering based on a common prefix, Edge Delta can easily merge the clusters by using their ancestors in the tree. The merge level determines how many levels Edge Delta will go up in the tree.

Levenshtein distance is a string metric that measures the difference between 2 sequences. The Levenshtein distance between 2 words is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other.

When a new raw log message arrives, Edge Delta processes the message with the Ragel FSM-based tokenization process.

Then, Edge Delta uses the Levenshtein distance algorithm to calculate similarities between tokens. If there is a similarity above a certain threshold, then Edge Delta will determine that these logs belong to the same log group.

The similarity calculation is based on the minimum number of operations required to make 2 tokens the same. If the required operation number is below a certain threshold, then the 2 tokens are more similar and grouped in the same log group. Otherwise, a new log group will be created based on the log message.

Find patterns in logs.

Handle regex matches.

Monitor the ratio of succcess patterns against failure patterns.

Return a list of ranked pattern matches.

Processors that are in Beta.