Edge Delta Anomaly Detection

Edge Delta automatically detects anomalies in observability data.

Edge Delta automatically detects anomalies in observability data, in individual agents as well as in aggregate on the backend. Site Reliability Engineers (SREs) and developer teams can receive alerts about anomalous behavior and see views designed to help with root cause analysis. This helps reduce the time needed to detect and resolve incidents.

Anomalies in Log Patterns

Once log patterns are streamed to the Edge Delta backend, monitors can be configured to detect anomalous behavior and trigger alerts to one or more notification channels.

Edge Delta provides two types of monitors for detecting anomalies in log patterns: the Skyline Pattern monitor and the Pattern Check monitor.

Skyline Pattern Monitor

The Skyline Pattern monitor uses a proprietary ‘skyline’ algorithm to detect unusual spikes in logs with negative sentiment. Log patterns for a particular source (e.g. a Kubernetes namespace or controller) are analyzed in aggregate, and an alert will be triggered if there is an unusual spike in either the total number of log messages with negative sentiment, or the number of unique negative patterns detected.

The algorithm is tuned to reduce false positives by accounting for repeated patterns (e.g. logs that result from a daily/weekly/monthly batch job) as well as normal fluctuations in log volume (e.g. increased traffic to a website during daytime hours).

Example anomaly detected by a Skyline Pattern monitor.

Pattern Check Monitor

The Pattern Check monitor performs a similar analysis to the Skyline Pattern monitor, but at the level of an individual pattern. It is useful for detecting spikes in individual patterns with negative sentiment, from both new as well as existing patterns.

Example anomaly detected by a Pattern Check monitor.

Anomalies in Metrics

After performing logs to metrics conversion, Edge Delta is able to detect anomalies in the data collected by individual agents as well as in data aggregated from multiple agents.

Agent Processor Alerts

The Edge Delta agent can be configured to track the value of log metrics over time, detect anomalous values, and alert you if it finds any.

For instance, you may want to alerted if there is unusually high frequency of log messages containing ERROR or EXCEPTION. To determine if the frequency is unusually high, the agent calculates an anomaly score between 0 to 100 using a proprietary algorithm, and if the metric value exceeds a defined threshold during a given interval, the value will be considered anomalous.

  - name: error-monitoring
      anomaly_probability_percentage: 95
    retention: 12h0m0s
    pattern: (?i)error
  - name: exception-monitoring
      anomaly_probability_percentage: 95
    retention: 12h0m0s
    pattern: (?i)exception

Depending on configuration, an alert may be sent (typically a Slack message or an email), and an anomaly capture may occur, resulting in raw logs around the time of the anomaly being sent to the Edge Delta backend and/or a 3rd party streaming destination.

Metrics Anomalies in the Edge Delta Web App

Metrics-based anomalies can be viewed in the Edge Delta web app on the Insights screen.

Metrics-based anomalies, grouped by processor rule.

Click Investigate for an anomaly to bring up an Investigation view, which provides details about why an alert was triggered as well as contextual information such as related logs and metric values leading up to the anomaly.

Anomaly detected by the exception_monitoring processor.
Log pattern which matched the processor criteria and contributed to anomaly score.
Metric value at the time of anomaly.
Anomaly score exceeding threshold of 95.

Metrics Alert Monitors

Since many production services run across multiple hosts, it is often useful to collect metric values in aggregate from all hosts, analyze them, and trigger alerts if a threshold is exceeded.

A metrics alert monitor can be configured to trigger when the aggregated metric value or anomaly score from many agent instances exceeds a defined threshold. Click Create Alert on the Metrics view in the web app to define a metric alert.

Create Alert command to set up a Metrics Alert monitor.
Metric alert based on Anomaly Score.

When the threshold defined in a metrics alert monitor is exceeded, a notification is sent (via email, Slack, Pager Duty, etc.) with a link to an Investigation view, similar to that for a processor-detected metric anomaly.

Correlated Signal Monitor

Similar to the Skyline Pattern Monitor for log patterns, the Correlated Signal Monitor checks for instances where an unusually high number of metrics-based anomalies were detected.

These anomalies may have been triggered by different processors and/or originate from different hosts, but the aggregate behavior is considered anomalous compared to a known baseline.

Correlated signal alert, grouped by processor.
The same correlated signal alert, grouped by processor and host.

Click Create Alert on the Insights screen to configure a correlated Signal monitor.