Reducing Agent Resource Consumption

Learn how to optimize Edge Delta agent resource consumption through feature configuration, pipeline design, and deployment settings.

8 minute read

Overview

Edge Delta agents are designed to be lightweight and efficient, but resource consumption can vary significantly based on enabled features, pipeline complexity, and data volume. This guide helps you optimize agent resource usage while understanding the trade-offs involved.

Typical Resource Consumption

Under normal operation with standard telemetry pipelines:

CPU: 0.2-0.5 vCPU per agent (per node)
Memory: 500MB-1GB per agent
Pipeline Memory Multiplier: ~2.4x (e.g., 50GB/day data volume ≈ 120GB in pipeline memory)

These baseline metrics increase when additional features like eBPF-based sources or live capture are enabled, or when processing high-cardinality data.

When to Optimize

Consider optimizing agent resources when:

Agents are the largest resource consumers in your stack
You’re experiencing memory pressure or CPU throttling
You need to reduce costs in large-scale deployments
Regulatory or operational requirements limit resource allocation
Agents are causing OOMKills or performance degradation

High-Impact Optimizations

1. Disable eBPF-Based Sources (Kubernetes Only)

Resource Impact: Highest impact - significant reduction

Note: eBPF-based sources are only available in Kubernetes deployments. If you’re running Edge Delta on Linux virtual machines or bare metal, these sources are not applicable to your environment.

The Kubernetes Trace source (k8s_trace_input) and Kubernetes Service Map source (k8s_traffic_input) use eBPF to capture network-level telemetry. While powerful, these features consume substantial CPU and memory resources.

Configuration (Kubernetes only):

Disable eBPF globally via Helm:

helm upgrade edgedelta edgedelta/edgedelta \
  --set tracerProps.enabled=false \
  -n edgedelta

Alternatively, remove the source nodes from your pipeline configuration if you only want to disable specific eBPF functionality while keeping other tracer features.

Trade-offs:

Major reduction in CPU and memory usage
Reduced GC pressure and improved stability
Lose Service Map visualization
Cannot capture eBPF-based traces
No automatic network traffic monitoring

When to Disable:

You don’t need service-to-service traffic visibility
You’re using alternative APM/tracing solutions
Resource constraints outweigh observability benefits
N/A for Linux environments - eBPF sources only work in Kubernetes

Verification:

# Check if tracer is disabled
kubectl get daemonset -n edgedelta -o yaml | grep -A 5 "tracerProps"

# Verify no eBPF sources in pipeline
kubectl exec -n edgedelta <pod-name> -- grep -E "k8s_trace_input|k8s_traffic_input" /edgedelta/config.yml

2. Disable Live Capture in Production

Resource Impact: High impact - 15-20% reduction in high-volume environments

Live Capture enables real-time pipeline debugging and data preview in the Edge Delta UI. While invaluable during development, it consumes resources by caching data items in memory and performing JSON marshaling operations.

Resource Cost:

Memory: 15-20% overhead from in-memory caching of captured items
CPU: JSON marshaling cost for serialization
Volume Dependency: Impact scales with data volume processed by the agent

Configuration:

For Kubernetes deployments, set the environment variable via Helm:

helm upgrade edgedelta edgedelta/edgedelta \
  --set env[0].name=ED_DISABLE_LIVE_CAPTURE \
  --set env[0].value="1" \
  -n edgedelta

Or using a values file:

env:
  - name: ED_DISABLE_LIVE_CAPTURE
    value: "1"

For Linux deployments, set the environment variable in your service configuration:

# For systemd services, add to /etc/systemd/system/edgedelta.service
Environment="ED_DISABLE_LIVE_CAPTURE=1"

# Or export before running the agent
export ED_DISABLE_LIVE_CAPTURE=1

Trade-offs:

15-20% reduction in CPU and memory in high-volume scenarios
Reduced network egress to Edge Delta backend
Eliminates real-time data sampling concerns
Cannot use in-stream debugging features
No live data preview when building processors
Harder to troubleshoot pipeline behavior in production

When to Disable:

Production environments with stable, tested pipelines
High-volume environments (>50GB/day per agent)
Security/compliance requirements prohibit real-time sampling
Network policies restrict outbound data transmission
Resource constraints are critical

When to Keep Enabled:

Development and staging environments
Actively building and testing new pipelines
Troubleshooting data processing issues
Need AI-powered processor recommendations

Verification:

# Check environment variable is set
kubectl get pods -n edgedelta -o jsonpath='{.items[0].spec.containers[0].env[?(@.name=="ED_DISABLE_LIVE_CAPTURE")].value}'

# Should return: 1

3. Optimize Self-Telemetry Cardinality

Resource Impact: Medium impact

The Self Telemetry source generates metrics about agent health and pipeline statistics. In version 2.5.0, increased cardinality in self-telemetry metrics can impact resource usage.

Configuration:

If you notice high self-telemetry volume, consider:

Disabling intermediate node telemetry to report only input and output node stats
Filtering or sampling self-telemetry metrics before forwarding
Aggregating metrics at higher intervals using report_interval
Disabling specific metric types if not needed

nodes:
- name: ed_self_telemetry_input
  type: ed_self_telemetry_input
  enable_health_metrics: true
  enable_agent_stats_metrics: false  # Disable if not needed
  disable_intermediate_self_telemetry: true  # Only emit input/output node stats
  report_interval: 2m  # Report less frequently (default: 1m)

The disable_intermediate_self_telemetry parameter is particularly useful in complex pipelines with many processor nodes. When enabled, the agent skips collecting per-node throughput stats for intermediate nodes (processors, transforms), reporting only input and output node statistics. This reduces the volume of metrics generated and lowers CPU and memory overhead.

Trade-offs:

Reduced metric cardinality and memory usage
Less granular visibility into agent performance
May impact troubleshooting capabilities
With disable_intermediate_self_telemetry, you lose per-node throughput data for intermediate pipeline nodes in the Pipeline view

Pipeline-Level Optimizations

Beyond disabling features, optimize how your pipeline processes data:

4. Design Efficient Pipelines

Follow best practices from Designing Efficient Pipelines:

Reuse extracted values: Extract once with regex, reuse multiple times
Minimize regex operations: Most computationally expensive CEL macro
Filter early: Drop unwanted data before expensive transformations
Avoid overlapping conditions: Prevent duplicate processing

5. Follow Processor Best Practices

Apply recommendations from Processor Best Practices:

Use mutually exclusive conditions in multi-processor nodes
Avoid overloading single processor nodes (limit to 2-3 extract/aggregate chains)
Use name == conditions to tightly scope aggregate metrics
Implement effective sampling to reduce volume

6. Optimize Data Routing

Sample aggressively: Use Sample Processor early in pipeline
Filter unused telemetry: Remove logs/metrics you don’t need
Aggregate before forwarding: Reduce destination ingestion costs
Use consistent hashing: For gateway pipelines, ensures efficient routing

Kubernetes Resource Configuration

Set Appropriate Resource Limits

Configure Kubernetes resource requests and limits based on your workload. See Helm Values for full details.

Conservative (minimal features, low volume):

helm upgrade edgedelta edgedelta/edgedelta \
  --set resources.requests.cpu=100m \
  --set resources.requests.memory=256Mi \
  --set resources.limits.cpu=500m \
  --set resources.limits.memory=512Mi \
  -n edgedelta

Standard (typical production):

helm upgrade edgedelta edgedelta/edgedelta \
  --set resources.requests.cpu=200m \
  --set resources.requests.memory=512Mi \
  --set resources.limits.cpu=1000m \
  --set resources.limits.memory=2Gi \
  -n edgedelta

High-Volume (eBPF enabled, high throughput):

helm upgrade edgedelta edgedelta/edgedelta \
  --set resources.requests.cpu=500m \
  --set resources.requests.memory=1Gi \
  --set resources.limits.cpu=2000m \
  --set resources.limits.memory=4Gi \
  -n edgedelta

Monitoring and Profiling

Identify Resource Bottlenecks

Use profiling with pprof to identify resource hotspots:

Enable profiling in the Edge Delta UI for specific agents
Look for spikes tied to:
- Metric extraction and aggregation
- eBPF-based sources (Service Map, K8s Trace)
- High-cardinality metric generation
- Frequent garbage collection cycles

Key Metrics to Monitor

Watch these self-telemetry metrics in the Metrics Explorer:

ed.agent.memory.usage: Memory consumption per agent
ed.agent.cpu.usage: CPU usage percentage
ed.agent.gc.duration: Garbage collection frequency and duration
ed.pipeline.*.throughput: Data volume per pipeline component

Signs of Resource Pressure

High GC frequency: Indicates memory pressure
CPU throttling: Agent hitting CPU limits
OOMKills: Memory limits too low or memory leak
Increasing memory over time: Potential memory leak or unbounded buffering

Decision Framework

Use this matrix to decide which optimizations to apply:

Feature	Resource Impact	Use Case	Disable When…
eBPF Sources (k8s_trace, k8s_traffic)	Very High	Service mesh visibility, network monitoring	Using external APM, resource-constrained, no service map needed
Live Capture	High (15-20% in high-volume)	Pipeline development, debugging	Production with stable pipelines, high-volume (>50GB/day), compliance restrictions
Intermediate Self-Telemetry	Medium	Per-node throughput visibility	Complex pipelines with many processors, resource-constrained environments
Self-Telemetry Stats	Medium	Agent health monitoring, troubleshooting	Minimal observability needs, external agent monitoring
Complex Processors	Medium	Advanced transformations, enrichment	Can simplify logic, use downstream processing
High-Cardinality Metrics	Low-Medium	Detailed analytics, fine-grained monitoring	Can aggregate, acceptable to lose granularity

Example: Production Optimization

Kubernetes Deployment

Here’s a complete Helm command for a resource-optimized production deployment:

helm upgrade edgedelta edgedelta/edgedelta -i \
  --version v1.17.0 \
  --set secretApiKey.value=<your-api-key> \
  --set tracerProps.enabled=false \
  --set env[0].name=ED_DISABLE_LIVE_CAPTURE \
  --set env[0].value="1" \
  --set resources.requests.cpu=200m \
  --set resources.requests.memory=512Mi \
  --set resources.limits.cpu=1000m \
  --set resources.limits.memory=2Gi \
  -n edgedelta --create-namespace

This configuration:

Disables eBPF sources for major resource savings
Disables live capture for production stability
Configures conservative resource limits

Linux Deployment

For Linux environments (VM or bare metal), configure the agent with minimal resource consumption:

1. Set environment variables in your service configuration (/etc/systemd/system/edgedelta.service):

[Service]
Environment="ED_DISABLE_LIVE_CAPTURE=1"

2. Ensure your pipeline configuration avoids resource-intensive features:

Remove any k8s_trace_input or k8s_traffic_input sources (these only work in Kubernetes)
Optimize processor chains as described in Pipeline-Level Optimizations
Apply aggressive sampling if processing high data volumes

3. Monitor resource usage using system tools:

# Check CPU and memory usage
ps aux | grep edgedelta

# View agent logs
journalctl -u edgedelta -f

Expected Results

With these optimizations applied:

CPU: ~0.2-0.3 vCPU per agent (vs 0.5+ with all features)
Memory: ~512MB-1GB per agent (vs 1.5-2GB with all features)
Overall reduction: 40-60% compared to default configuration with all features enabled

Troubleshooting

Agents Still Using High Resources

Profile with pprof: Identify actual bottlenecks
Check pipeline complexity: Review processor configuration
Examine data volume: May need additional sampling/filtering
Review destination health: Backpressure can cause buffering
Check for memory leaks: Increasing memory over time indicates issues

Verification Commands

# Check current resource usage
kubectl top pods -n edgedelta

# View agent configuration
kubectl exec -n edgedelta <pod-name> -- cat /edgedelta/config.yml

# Check for OOMKills
kubectl get pods -n edgedelta -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].lastState.terminated.reason}{"\n"}{end}'

# View environment variables
kubectl describe pod -n edgedelta <pod-name> | grep -A 20 "Environment:"

Reducing Agent Resource Consumption

Overview

Typical Resource Consumption

When to Optimize

High-Impact Optimizations

1. Disable eBPF-Based Sources (Kubernetes Only)

2. Disable Live Capture in Production

3. Optimize Self-Telemetry Cardinality

Pipeline-Level Optimizations

4. Design Efficient Pipelines

5. Follow Processor Best Practices

6. Optimize Data Routing

Kubernetes Resource Configuration

Set Appropriate Resource Limits

Monitoring and Profiling

Identify Resource Bottlenecks

Key Metrics to Monitor

Signs of Resource Pressure

Decision Framework

Example: Production Optimization

Kubernetes Deployment

Linux Deployment

Expected Results

Troubleshooting

Agents Still Using High Resources

Verification Commands

Related Documentation

Edge Delta AI Assistant

Quick Topics

Recent Questions

Hi! I'm your Edge Delta AI Assistant

Current Context