# Edge Delta Log Patterns

4 minute read

## Pattern Detection

The Edge Delta agent uses a proprietary algorithm to automatically detect repeated patterns in log messages *on the edge*. This allows it to optimize data by reporting patterns and their frequency using rather than streaming the full log messages. Variant values within a pattern are expressed as a wildcard (`*`

).

## Log to Pattern

Log patterns are detected by pipeline nodes that are configured on an Edge Delta agent. The log to pattern node will track and report the most frequently occurring patterns for the default or specified interval, and it will send the specified number of full log samples for each pattern.

## Sentiment Analysis

Every pattern detected by the Edge Delta agent is further analyzed to check for negative sentiment. Negative sentiment is determined by checking for the presence of specific keywords in the pattern (e.g. `error`

, `exception`

, `fail`

, etc.). Some keywords such as `debug`

are considered neutralizing because they automatically offset negative keyword matches in the pattern.

The negative and neutralizing keywords used in sentiment analysis can be configured in the Pipeline Settings for an account. They are applied to all agents within the account.

## Pattern Visualization

Log patterns detected by the Edge Delta agent can be viewed in the Edge Delta web app as well as 3rd party observability tools.

### Edge Delta Web App

The **Logs - Patterns** page shows a Negative Patterns pane listing all patterns with negative sentiment.

It also shows the All Patterns pane of patterns with any sentiment.

Top Patterns is a table of the most frequently occurring patterns and their associated statistics.

### 3rd Party Tools

Patterns can also be viewed in any streaming destination that accepts log data.

## Anomaly Detection in Log Patterns

Once pattern data is sent to the Edge Delta backend, it can be further analyzed for anomalies. See our article on anomaly detection for more details.

## Clustered Invariants

The Edge Delta Log to Pattern node calculates similarities between invariants for clustering purposes.

When a new log passes through the pipeline:

- Variants are identified via a proprietary Ragel FSM-based tokenization process.
- The identified variants are stripped from the log and replaced with wildcards.
- The remaining invariant components are compared to existing pattern sets to calculate similarities. Invariant components are calculated for similarities so that the invariants can be transformed and clustered into structured log messages.

There are 2 ways to calculate similarities:

- Drain
- Levenshtein distance

### Drain

Drain is the default log parsing algorithm used to cluster logs. This algorithm is based on a parse tree, with a fixed depth to guide the log group search process. This workflow helps to avoid a deep and unbalanced tree.

When a new raw log message arrives, Edge Delta processes the message with the Ragel FSM-based tokenization process. Then, Edge Delta searches for a log group through the nodes of the tree, based on the token prefix.

If a suitable log group is found, then Edge Delta will also calculates the similarities between the log message and the log event stored in the log group. If the similarity rate is beyond a certain threshold, then the log message will be matched with the log event stored in that log group.

If not, a new log group will be created based on the log message. To accelerate this process, Edge Delta designs a parse tree with a fixed depth, and nodes with fixed children to guide the log group search. This helps to limit the number of log groups that a raw log message needs to be compared to.

Since Edge Delta uses a drain log parse tree for clustering based on a common prefix, Edge Delta can easily merge the clusters by using their ancestors in the tree. The merge level determines how many levels Edge Delta will go up in the tree.

### Levenshtein Distance

Levenshtein distance is a string metric that measures the difference between 2 sequences. The Levenshtein distance between 2 words is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other.

When a new raw log message arrives, Edge Delta processes the message with the Ragel FSM-based tokenization process.

Then, Edge Delta uses the Levenshtein distance algorithm to calculate similarities between tokens. If there is a similarity beyond a certain threshold, then Edge Delta will determine that these logs belong to the same log group.

The similarity calculation is based on the minimum number of operations required to make 2 tokens the same. If the required operation number is less than a certain threshold, then the 2 tokens are more similar and grouped in the same log group. Otherwise, a new log group will be created based on the log message.