Edge Delta automatically detects anomalies in observability data, in individual agents as well as in aggregate on the backend. Site Reliability Engineers (SREs) and developer teams can receive alerts about anomalous behavior and see views designed to help with root cause analysis. This helps reduce the time needed to detect and resolve incidents.
Anomalies in Log Patterns
Edge Delta provides two types of monitors for detecting anomalies in log patterns, the Skyline Pattern monitor and the Pattern Check monitor.
Skyline Pattern Monitor
The Skyline Pattern monitor uses a proprietary 'skyline' algorithm to detect unusual spikes in logs with negative sentiment. Log patterns for a particular source (e.g. a Kubernetes namespace or controller) are analyzed in aggregate, and an alert will be triggered if there is an usual spike in either the total number of log messages with negative sentiment, or the number of unique negative patterns detected.
The algorithm is tuned to reduce false positives by accounting for repeated patterns (e.g. logs that result from a daily/weekly/monthly batch job) as well as normal fluctuations in log volume (e.g. increased traffic to a website during daytime hours).
Pattern Check Monitor
The Pattern Check monitor performs a similar analysis to the Skyline Pattern monitor, but at the level of an individual pattern. It is useful for detecting spikes in individual patterns with negative sentiment, from both new as well as existing patterns.
Anomalies in Metrics
After performing logs to metrics conversion, Edge Delta is able to detect anomalies in the data collected by individual agents as well as in data aggregated from multiple agents.
Agent Processor Alerts
The Edge Delta agent can be configured to track the value of log metrics over time, detect anomalous values, and alert you if it finds any.
For instance, you may want to alerted if there is unusually high frequency of log messages containing
EXCEPTION. To determine if the frequency is unusually high, the agent calculates an anomaly score between 0 to 100 using a proprietary algorithm, and if the metric value exceeds a defined threshold during a given interval, the value will be considered anomalous.
Depending on configuration, an alert may be sent (typically a Slack message or an email), and an anomaly capture may occur, resulting in raw logs around the time of the anomaly being sent to the Edge Delta backend and/or a 3rd party streaming destination.
Metrics Anomalies in the Edge Delta Web App
Metrics-based anomalies can be viewed in the Edge Delta web app on the Insights screen.
Click Investigate for an anomaly to bring up an Investigation view, which provides details about why an alert was triggered as well as contextual information such as related logs and metric values leading up to the anomaly.
Metrics Alert Monitors
Since many production services run across multiple hosts, it is often useful to collect metric values in aggregate from all hosts, analyze them, and trigger alerts if a threshold is exceeded.
A metrics alert monitor can be configured to trigger when the aggregated metric value or anomaly score from many agent instances exceeds a defined threshold. Click Create Alert on the Metrics view in the web app to define a metric alert.
When the threshold defined in a metrics alert monitor is exceeded, a notification is sent (via email, Slack, Pager Duty, etc.) with a link to an Investigation view, similar to that for a processor-detected metric anomaly.
Correlated Signal Monitor
These anomalies may have been triggered by different processors and/or originate from different hosts, but the aggregate behavior is considered anomalous compared to a known baseline.
Click Create Alert on the Insights screen to configure a correlated Signal monitor.