Reducing Agent Resource Consumption

Learn how to optimize Edge Delta agent resource consumption through feature configuration, pipeline design, and deployment settings.

Overview

Edge Delta agents are designed to be lightweight and efficient, but resource consumption can vary significantly based on enabled features, pipeline complexity, and data volume. This guide helps you optimize agent resource usage while understanding the trade-offs involved.

Typical Resource Consumption

Under normal operation with standard telemetry pipelines:

  • CPU: 0.2-0.5 vCPU per agent (per node)
  • Memory: 500MB-1GB per agent
  • Pipeline Memory Multiplier: ~2.4x (e.g., 50GB/day data volume ≈ 120GB in pipeline memory)

These baseline metrics increase when additional features like eBPF-based sources or live capture are enabled, or when processing high-cardinality data.

When to Optimize

Consider optimizing agent resources when:

  • Agents are the largest resource consumers in your stack
  • You’re experiencing memory pressure or CPU throttling
  • You need to reduce costs in large-scale deployments
  • Regulatory or operational requirements limit resource allocation
  • Agents are causing OOMKills or performance degradation

High-Impact Optimizations

1. Disable eBPF-Based Sources (Kubernetes Only)

Resource Impact: Highest impact - significant reduction

Note: eBPF-based sources are only available in Kubernetes deployments. If you’re running Edge Delta on Linux virtual machines or bare metal, these sources are not applicable to your environment.

The Kubernetes Trace source (k8s_trace_input) and Kubernetes Service Map source (k8s_traffic_input) use eBPF to capture network-level telemetry. While powerful, these features consume substantial CPU and memory resources.

Configuration (Kubernetes only):

Disable eBPF globally via Helm:

helm upgrade edgedelta edgedelta/edgedelta \
  --set tracerProps.enabled=false \
  -n edgedelta

Alternatively, remove the source nodes from your pipeline configuration if you only want to disable specific eBPF functionality while keeping other tracer features.

Trade-offs:

  • Major reduction in CPU and memory usage
  • Reduced GC pressure and improved stability
  • Lose Service Map visualization
  • Cannot capture eBPF-based traces
  • No automatic network traffic monitoring

When to Disable:

  • You don’t need service-to-service traffic visibility
  • You’re using alternative APM/tracing solutions
  • Resource constraints outweigh observability benefits
  • N/A for Linux environments - eBPF sources only work in Kubernetes

Verification:

# Check if tracer is disabled
kubectl get daemonset -n edgedelta -o yaml | grep -A 5 "tracerProps"

# Verify no eBPF sources in pipeline
kubectl exec -n edgedelta <pod-name> -- grep -E "k8s_trace_input|k8s_traffic_input" /edgedelta/config.yml

2. Disable Live Capture in Production

Resource Impact: High impact - 15-20% reduction in high-volume environments

Live Capture enables real-time pipeline debugging and data preview in the Edge Delta UI. While invaluable during development, it consumes resources by caching data items in memory and performing JSON marshaling operations.

Resource Cost:

  • Memory: 15-20% overhead from in-memory caching of captured items
  • CPU: JSON marshaling cost for serialization
  • Volume Dependency: Impact scales with data volume processed by the agent

Configuration:

For Kubernetes deployments, set the environment variable via Helm:

helm upgrade edgedelta edgedelta/edgedelta \
  --set env[0].name=ED_DISABLE_LIVE_CAPTURE \
  --set env[0].value="1" \
  -n edgedelta

Or using a values file:

env:
  - name: ED_DISABLE_LIVE_CAPTURE
    value: "1"

For Linux deployments, set the environment variable in your service configuration:

# For systemd services, add to /etc/systemd/system/edgedelta.service
Environment="ED_DISABLE_LIVE_CAPTURE=1"

# Or export before running the agent
export ED_DISABLE_LIVE_CAPTURE=1

Trade-offs:

  • 15-20% reduction in CPU and memory in high-volume scenarios
  • Reduced network egress to Edge Delta backend
  • Eliminates real-time data sampling concerns
  • Cannot use in-stream debugging features
  • No live data preview when building processors
  • Harder to troubleshoot pipeline behavior in production

When to Disable:

  • Production environments with stable, tested pipelines
  • High-volume environments (>50GB/day per agent)
  • Security/compliance requirements prohibit real-time sampling
  • Network policies restrict outbound data transmission
  • Resource constraints are critical

When to Keep Enabled:

  • Development and staging environments
  • Actively building and testing new pipelines
  • Troubleshooting data processing issues
  • Need AI-powered processor recommendations

Verification:

# Check environment variable is set
kubectl get pods -n edgedelta -o jsonpath='{.items[0].spec.containers[0].env[?(@.name=="ED_DISABLE_LIVE_CAPTURE")].value}'

# Should return: 1

3. Optimize Self-Telemetry Cardinality

Resource Impact: Medium impact

The Self Telemetry source generates metrics about agent health and pipeline statistics. In version 2.5.0, increased cardinality in self-telemetry metrics can impact resource usage.

Configuration:

If you notice high self-telemetry volume, consider:

  • Filtering or sampling self-telemetry metrics before forwarding
  • Aggregating metrics at higher intervals
  • Disabling specific metric types if not needed
nodes:
- name: ed_self_telemetry_input
  type: ed_self_telemetry_input
  enable_health_metrics: true
  enable_agent_stats_metrics: false  # Disable if not needed

Trade-offs:

  • Reduced metric cardinality and memory usage
  • Less granular visibility into agent performance
  • May impact troubleshooting capabilities

Pipeline-Level Optimizations

Beyond disabling features, optimize how your pipeline processes data:

4. Design Efficient Pipelines

Follow best practices from Designing Efficient Pipelines:

  • Reuse extracted values: Extract once with regex, reuse multiple times
  • Minimize regex operations: Most computationally expensive CEL macro
  • Filter early: Drop unwanted data before expensive transformations
  • Avoid overlapping conditions: Prevent duplicate processing

5. Follow Processor Best Practices

Apply recommendations from Processor Best Practices:

  • Use mutually exclusive conditions in multi-processor nodes
  • Avoid overloading single processor nodes (limit to 2-3 extract/aggregate chains)
  • Use name == conditions to tightly scope aggregate metrics
  • Implement effective sampling to reduce volume

6. Optimize Data Routing

  • Sample aggressively: Use Sample Processor early in pipeline
  • Filter unused telemetry: Remove logs/metrics you don’t need
  • Aggregate before forwarding: Reduce destination ingestion costs
  • Use consistent hashing: For gateway pipelines, ensures efficient routing

Kubernetes Resource Configuration

Set Appropriate Resource Limits

Configure Kubernetes resource requests and limits based on your workload. See Helm Values for full details.

Conservative (minimal features, low volume):

helm upgrade edgedelta edgedelta/edgedelta \
  --set resources.requests.cpu=100m \
  --set resources.requests.memory=256Mi \
  --set resources.limits.cpu=500m \
  --set resources.limits.memory=512Mi \
  -n edgedelta

Standard (typical production):

helm upgrade edgedelta edgedelta/edgedelta \
  --set resources.requests.cpu=200m \
  --set resources.requests.memory=512Mi \
  --set resources.limits.cpu=1000m \
  --set resources.limits.memory=2Gi \
  -n edgedelta

High-Volume (eBPF enabled, high throughput):

helm upgrade edgedelta edgedelta/edgedelta \
  --set resources.requests.cpu=500m \
  --set resources.requests.memory=1Gi \
  --set resources.limits.cpu=2000m \
  --set resources.limits.memory=4Gi \
  -n edgedelta

Monitoring and Profiling

Identify Resource Bottlenecks

Use profiling with pprof to identify resource hotspots:

  1. Enable profiling in the Edge Delta UI for specific agents
  2. Look for spikes tied to:
    • Metric extraction and aggregation
    • eBPF-based sources (Service Map, K8s Trace)
    • High-cardinality metric generation
    • Frequent garbage collection cycles

Key Metrics to Monitor

Watch these self-telemetry metrics in the Metrics Explorer:

  • ed.agent.memory.usage: Memory consumption per agent
  • ed.agent.cpu.usage: CPU usage percentage
  • ed.agent.gc.duration: Garbage collection frequency and duration
  • ed.pipeline.*.throughput: Data volume per pipeline component

Signs of Resource Pressure

  • High GC frequency: Indicates memory pressure
  • CPU throttling: Agent hitting CPU limits
  • OOMKills: Memory limits too low or memory leak
  • Increasing memory over time: Potential memory leak or unbounded buffering

Decision Framework

Use this matrix to decide which optimizations to apply:

FeatureResource ImpactUse CaseDisable When…
eBPF Sources (k8s_trace, k8s_traffic)Very HighService mesh visibility, network monitoringUsing external APM, resource-constrained, no service map needed
Live CaptureHigh (15-20% in high-volume)Pipeline development, debuggingProduction with stable pipelines, high-volume (>50GB/day), compliance restrictions
Self-Telemetry StatsMediumAgent health monitoring, troubleshootingMinimal observability needs, external agent monitoring
Complex ProcessorsMediumAdvanced transformations, enrichmentCan simplify logic, use downstream processing
High-Cardinality MetricsLow-MediumDetailed analytics, fine-grained monitoringCan aggregate, acceptable to lose granularity

Example: Production Optimization

Kubernetes Deployment

Here’s a complete Helm command for a resource-optimized production deployment:

helm upgrade edgedelta edgedelta/edgedelta -i \
  --version v1.17.0 \
  --set secretApiKey.value=<your-api-key> \
  --set tracerProps.enabled=false \
  --set env[0].name=ED_DISABLE_LIVE_CAPTURE \
  --set env[0].value="1" \
  --set resources.requests.cpu=200m \
  --set resources.requests.memory=512Mi \
  --set resources.limits.cpu=1000m \
  --set resources.limits.memory=2Gi \
  -n edgedelta --create-namespace

This configuration:

  • Disables eBPF sources for major resource savings
  • Disables live capture for production stability
  • Configures conservative resource limits

Linux Deployment

For Linux environments (VM or bare metal), configure the agent with minimal resource consumption:

1. Set environment variables in your service configuration (/etc/systemd/system/edgedelta.service):

[Service]
Environment="ED_DISABLE_LIVE_CAPTURE=1"

2. Ensure your pipeline configuration avoids resource-intensive features:

  • Remove any k8s_trace_input or k8s_traffic_input sources (these only work in Kubernetes)
  • Optimize processor chains as described in Pipeline-Level Optimizations
  • Apply aggressive sampling if processing high data volumes

3. Monitor resource usage using system tools:

# Check CPU and memory usage
ps aux | grep edgedelta

# View agent logs
journalctl -u edgedelta -f

Expected Results

With these optimizations applied:

  • CPU: ~0.2-0.3 vCPU per agent (vs 0.5+ with all features)
  • Memory: ~512MB-1GB per agent (vs 1.5-2GB with all features)
  • Overall reduction: 40-60% compared to default configuration with all features enabled

Troubleshooting

Agents Still Using High Resources

  1. Profile with pprof: Identify actual bottlenecks
  2. Check pipeline complexity: Review processor configuration
  3. Examine data volume: May need additional sampling/filtering
  4. Review destination health: Backpressure can cause buffering
  5. Check for memory leaks: Increasing memory over time indicates issues

Verification Commands

# Check current resource usage
kubectl top pods -n edgedelta

# View agent configuration
kubectl exec -n edgedelta <pod-name> -- cat /edgedelta/config.yml

# Check for OOMKills
kubectl get pods -n edgedelta -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].lastState.terminated.reason}{"\n"}{end}'

# View environment variables
kubectl describe pod -n edgedelta <pod-name> | grep -A 20 "Environment:"