Monitoring and Visibility

Monitor agent health, throughput, and performance metrics across your Edge Delta pipelines and agents.

3 minute read

Overview

Edge Delta provides continuous monitoring of agent health, throughput, and performance across your entire infrastructure. Built-in telemetry inputs capture operational data from every agent, enabling real-time detection of issues and proactive capacity planning.

Agent Health Monitoring

Edge Delta agents emit health telemetry that provides visibility into operational status. Built-in health inputs include:

ed_component_health: Component-level health status
ed_node_health: Node-level health metrics
ed_agent_stats: Agent performance statistics
ed_pipeline_io_stats: Input/output throughput data

Each agent sends a heartbeat every minute to the Edge Delta backend, enabling real-time detection of connectivity issues, crashes, or configuration problems.

Health Indicators

Health indicators show agent state:

State	Description
Healthy	Agent is running and processing data normally
Warning	Performance degradation or partial failures detected
Critical	Agent is down or experiencing severe issues
Unknown	No recent heartbeat received

Pipeline Dashboard

The Pipeline Dashboard provides:

Pipeline overview with visual status of all pipelines at a glance
Individual agent status with deployment details
Deployment status to track agent versions and configuration state
Heartbeat monitoring with minute-by-minute agent availability checks

Throughput Monitoring

Track data volume and processing rates across all pipeline stages:

Metric	Description	Use Case
Input Rate	Events/sec ingested by sources	Capacity planning
Processing Rate	Events/sec through processors	Performance tuning
Output Rate	Events/sec sent to destinations	Destination health
Drop Rate	Events/sec filtered or dropped	Filter effectiveness
Backpressure	Queue depth and latency	Flow control

Pipeline I/O statistics show the flow through each stage. For example, a production logs pipeline might show:

45,000 events/sec input
12,000 events/sec filtered (26.7%)
33,000 events/sec processed
28,000 events/sec enriched
28,000 events/sec output (62.2% reduction)

These metrics enable teams to:

Identify bottlenecks in processing pipelines
Validate filter effectiveness and data reduction
Detect anomalies in traffic patterns
Optimize resource allocation

Performance Metrics

Monitor resource utilization and processing efficiency. Agent performance metrics include:

CPU usage: Per-agent utilization and trends
Memory usage: Heap allocation and garbage collection
Disk I/O: Buffer usage for output queuing
Network: Egress bandwidth to destinations
Latency: End-to-end processing latency by node

Each processor node reports individual performance metrics including:

Events processed per second
Processing latency (P50, P95, P99)
Error rate and retry statistics
Cache hit rates for stateful processors

For example, an agent might show CPU at 245m/500m (49%), memory at 512MB/1GB (51%), processing at 12,500 events/sec, latency at P95=45ms and P99=120ms, and error rate at 0.02%.

Monitoring Strategy Best Practices

Establish observability practices that scale with your pipelines:

Establish Baselines

Measure normal throughput and latency
Track resource utilization patterns
Document expected behavior

Define SLOs

Typical targets for pipeline SLOs include:

99.9% agent availability
P99 processing latency under 200ms
Error rate below 0.1%
Zero data loss

Alert on Trends

Reduce alert fatigue by alerting on trends rather than spikes:

Use rate-of-change alerts
Apply moving averages
Set appropriate thresholds
Configure meaningful alert windows

Pipeline Dashboard - View pipeline health and status
Self Telemetry Source - Configure agent self-telemetry
Reducing Agent Resource Consumption - Performance tuning
Flow Control - Manage data volume dynamically
Anomaly Detection - Detect anomalies in telemetry patterns

Monitoring and Visibility

Overview

Agent Health Monitoring

Health Indicators

Pipeline Dashboard

Throughput Monitoring

Performance Metrics

Monitoring Strategy Best Practices

Establish Baselines

Define SLOs

Alert on Trends

Related Documentation

Edge Delta AI Assistant

Conversations

Hi! I'm your Edge Delta AI Assistant

Current Context