ED Watcher

Monitor Edge Delta agent health with ED Watcher, a companion component that detects out-of-memory events, crash loops, evictions, and scheduling failures.

Overview

ED Watcher is a companion component that monitors the health of the Edge Delta agent and reports issues as time series metrics. It detects conditions such as out-of-memory events, crash loops, pod evictions, and scheduling failures that may affect agent availability.

ED Watcher is disabled by default and must be explicitly enabled.

This feature requires Edge Delta agent version v2.13.0 or higher.

Deployment modes

ED Watcher supports two deployment modes depending on your environment.

Kubernetes (sidecar)

In Kubernetes deployments, ED Watcher runs as a sidecar container alongside the Edge Delta agent pod. It monitors the agent container and the pod’s scheduling status.

To enable ED Watcher via Helm:

helm upgrade edgedelta edgedelta/edgedelta \
  --set watcherProps.enabled=true

Host (standalone process)

For host-based deployments (Linux, macOS), ED Watcher runs as a standalone process that monitors the Edge Delta agent process. If the agent process is not found, ED Watcher reports a process_not_found metric.

Metrics

ED Watcher emits the following time series metrics:

MetricDescription
ed.agent.oomAgent terminated due to out-of-memory conditions
ed.agent.crashloopAgent is in a crash loop (repeated restarts)
ed.agent.evictedAgent pod was evicted from the node
ed.agent.failed_schedulingAgent pod could not be scheduled on any node
ed.agent.process_not_foundAgent process is not running (host deployments only)

These metrics are available in the Edge Delta platform and can be used to create monitors and alerts for agent health.

Configuration

ED Watcher does not require pipeline configuration. It operates independently of the agent’s pipeline and reports metrics directly.

Helm values

ParameterTypeDefaultDescription
watcherProps.enabledboolfalseEnable ED Watcher as a sidecar container

Troubleshooting

ED Watcher not reporting metrics

  • Verify that watcherProps.enabled=true is set in your Helm values
  • Check that the ED Watcher container is running in the agent pod: kubectl get pods -l app=edgedelta -o jsonpath='{.items[0].spec.containers[*].name}'
  • Review ED Watcher container logs: kubectl logs <pod-name> -c ed-watcher

False positive crash loop detection

ED Watcher monitors agent restarts. During configuration changes that trigger agent reloads, brief restart sequences may be reported as crash loops. These are expected during deployments and configuration updates.