Trigger a Metric Alert with Edge Delta

Trigger an alert based on a metrics threshold in 5 minutes.

Overview

The Threshold node in Edge Delta serves a critical role in activating alerts based on specific conditions met within the metrics derived from log data. By defining set thresholds, this node triggers when metrics exceed or fall below the given parameters, ensuring that teams are promptly alerted to potential issues that require attention. In the context of an observability pipeline monitoring web server health, a Threshold node can be integral in detecting abnormal error rates, resource usage spikes, or performance anomalies.

The scenario continues from the log to metric how-to. Imagine an e-commerce platform where it’s vital to maintain an optimal user experience. The Threshold node could be configured to watch for an unusual increase in HTTP 5xx errors within the NGINX server logs. When the error count per minute surpasses a pre-defined threshold, the node fires an alert to signify immediate investigation is needed. This capability is essential for maintaining service stability and swiftly resolving issues that could otherwise lead to user dissatisfaction or revenue loss.

In this scenario you have configured a Log to Metric node to count 5xx logs. After considering key performance indicators and service level agreements, you want to configure a Threshold node to alert on an elevated count of NGINX 5xx errors—indicative of server-side issues. The pipeline handles NGINX 2300 logs per minute. If the node encounters 23 or more 5xx errors per minute (1% of all NGINX logs), a signal should be sent to a Webhook Output node. The webhook output node, in turn, dispatches a custom alert to a notification application, prompting the team to act swiftly.

This is a sample of the generated logs:

Mar 25 17:22:43.190 ed_parallel 188.70.110.238 - - [25/03/2024:15:18:39 +0000] "PUT /utilize HTTP/1.0" 401 79083 "https://www.corporateapplications.com/other/enterprise" "Mozilla/5.0 (Windows 95) AppleWebKit/5311 (KHTML, like Gecko) Chrome/37.0.857.0 Mobile Safari/5311"
Mar 25 17:22:43.190 ed_parallel 207.1.189.215 - - [25/03/2024:15:18:39 +0000] "GET /empower HTTP/1.0" 503 3310 "http://www.forwardclicks-and-mortar.org/platforms/architect/orchestrate" "Mozilla/5.0 (Windows 98; Win 9x 4.90) AppleWebKit/5310 (KHTML, like Gecko) Chrome/37.0.884.0 Mobile Safari/5310"
Mar 25 17:22:43.190 ed_parallel 118.81.25.202 - abernathy5566 [25/03/2024:15:18:39 +0000] "PUT /next-generation HTTP/1.1" 200 28015 "http://www.globalsystems.info/engineer/optimize/vortals/synergize" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/5311 (KHTML, like Gecko) Chrome/38.0.827.0 Mobile Safari/5311"
Mar 25 17:22:43.190 ed_parallel 160.112.186.178 - - [25/03/2024:15:18:39 +0000] "GET /communities/networks/revolutionary/interactive HTTP/1.0" 501 13828 "https://www.internationalclicks-and-mortar.name/systems/productize/eyeballs" "Opera/8.18 (X11; Linux x86_64; en-US) Presto/2.10.336 Version/13.00"
Mar 25 17:22:43.190 ed_parallel 184.126.183.51 - - [25/03/2024:15:18:39 +0000] "PATCH /magnetic/wireless/paradigms HTTP/2.0" 201 58295 "https://www.chiefdeliverables.io/infrastructures" "Mozilla/5.0 (Windows 98; en-US; rv:1.9.0.20) Gecko/1906-13-11 Firefox/37.0"
Mar 25 17:22:43.190 ed_parallel 168.148.156.147 - - [25/03/2024:15:18:39 +0000] "PATCH /reinvent/cross-media HTTP/1.1" 401 35738 "http://www.nationalevolve.biz/scalable/engage" "Mozilla/5.0 (Macintosh; PPC Mac OS X 10_9_4 rv:2.0) Gecko/1900-18-08 Firefox/37.0"

Prerequisites

To use trigger alerts you need an Edge Delta account with an agent configuration already created. This is the configuration in which you will create the alert configuration. You have configured a Log to Metric node to count 5xx logs You also need a webhook URL for a notification application such as Slack, Teams, Discord, PagerDuty or Zapier to name a few.

Configure the Threshold Node

Create a Threshold Node that evaluates the metrics generated by the count_5xx log to metric node and issues an alert if the 5xx_per_minute.count metric is 23 or higher.

  1. Click Add Processor.
  2. Expand Analytics and select Threshold.
  3. Enter 23 5xx per minute or another suitable name.
  4. Enter the following CEL macro as the filter:
item.name == "5xx_per_minute.count"
  1. Enter the following Condition:
value > 23
  1. Click OK.
  2. Connect the count_5xx Log to Metric node’s output to the 23 5xx per minute Threshold node’s input.

Configure the Webhook Output Node

Create a Webhook Output node that sends a notification whenever an alert is received from the Threshold node.

  1. Click Add Output.
  2. Expand Trigger and select Webhook Output.
  3. Specify the webhook endpoint.
  4. Enter a suppression window of 20m. This prevents the notification channel from being flooded for 20 minutes after receiving the first alert.
  5. Expand Advanced Settings and enter the following Payload:
{"text": "More than 1% of NGINX traffic is a 5xx error in the past minute."}  
  1. Click OK.
  2. Connect the Threshold node’s output to the Webhook node.
  3. Click Review Changes.
  4. Click Deploy Changes.

If the threshold is met a notification is sent to the application consuming the webhook: