Reducing Agent Resource Consumption

Learn how to optimize Edge Delta agent resource consumption through feature configuration, pipeline design, and deployment settings.

Overview

Edge Delta agents are designed to be lightweight and efficient, but resource consumption can vary significantly based on enabled features, pipeline complexity, and data volume. This guide helps you optimize agent resource usage while understanding the trade-offs involved.

Typical Resource Consumption

Under normal operation with standard telemetry pipelines:

  • CPU: 0.2-0.5 vCPU per agent (per node)
  • Memory: 500MB-1GB per agent
  • Pipeline Memory Multiplier: ~2.4x (e.g., 50GB/day data volume ≈ 120GB in pipeline memory)

These baseline metrics increase when additional features like eBPF-based sources or live capture are enabled, or when processing high-cardinality data.

When to Optimize

Consider optimizing agent resources when:

  • Agents are the largest resource consumers in your stack
  • You’re experiencing memory pressure or CPU throttling
  • You need to reduce costs in large-scale deployments
  • Regulatory or operational requirements limit resource allocation
  • Agents are causing OOMKills or performance degradation

High-Impact Optimizations

1. Disable eBPF-Based Sources

Resource Impact: ⭐⭐⭐⭐⭐ (Highest impact - significant reduction)

The Kubernetes Trace source (k8s_trace_input) and Kubernetes Service Map source (k8s_traffic_input) use eBPF to capture network-level telemetry. While powerful, these features consume substantial CPU and memory resources.

Configuration:

Disable eBPF globally via Helm:

helm upgrade edgedelta edgedelta/edgedelta \
  --set tracerProps.enabled=false \
  -n edgedelta

Alternatively, remove the source nodes from your pipeline configuration if you only want to disable specific eBPF functionality while keeping other tracer features.

Trade-offs:

  • ✅ Major reduction in CPU and memory usage
  • ✅ Reduced GC pressure and improved stability
  • ❌ Lose Service Map visualization
  • ❌ Cannot capture eBPF-based traces
  • ❌ No automatic network traffic monitoring

When to Disable:

  • You don’t need service-to-service traffic visibility
  • You’re using alternative APM/tracing solutions
  • Resource constraints outweigh observability benefits
  • Non-Kubernetes environments (eBPF sources only work in Kubernetes)

Verification:

# Check if tracer is disabled
kubectl get daemonset -n edgedelta -o yaml | grep -A 5 "tracerProps"

# Verify no eBPF sources in pipeline
kubectl exec -n edgedelta <pod-name> -- grep -E "k8s_trace_input|k8s_traffic_input" /edgedelta/config.yml

2. Disable Live Capture in Production

Resource Impact: ⭐⭐⭐⭐ (High impact - 15-20% reduction in high-volume environments)

Live Capture enables real-time pipeline debugging and data preview in the Edge Delta UI. While invaluable during development, it consumes resources by caching data items in memory and performing JSON marshaling operations.

Resource Cost:

  • Memory: 15-20% overhead from in-memory caching of captured items
  • CPU: JSON marshaling cost for serialization
  • Volume Dependency: Impact scales with data volume processed by the agent

Configuration:

Set the environment variable via Helm:

helm upgrade edgedelta edgedelta/edgedelta \
  --set env[0].name=ED_DISABLE_LIVE_CAPTURE \
  --set env[0].value="1" \
  -n edgedelta

Note: If combining with other environment variables (like GOMEMLIMIT), use sequential indices: env[0], env[1], env[2], etc. See the production optimization example below.

Or using a values file:

env:
  - name: ED_DISABLE_LIVE_CAPTURE
    value: "1"

Trade-offs:

  • ✅ 15-20% reduction in CPU and memory in high-volume scenarios
  • ✅ Reduced network egress to Edge Delta backend
  • ✅ Eliminates real-time data sampling concerns
  • ❌ Cannot use in-stream debugging features
  • ❌ No live data preview when building processors
  • ❌ Harder to troubleshoot pipeline behavior in production

When to Disable:

  • Production environments with stable, tested pipelines
  • High-volume environments (>50GB/day per agent)
  • Security/compliance requirements prohibit real-time sampling
  • Network policies restrict outbound data transmission
  • Resource constraints are critical

When to Keep Enabled:

  • Development and staging environments
  • Actively building and testing new pipelines
  • Troubleshooting data processing issues
  • Need AI-powered processor recommendations

Verification:

# Check environment variable is set
kubectl get pods -n edgedelta -o jsonpath='{.items[0].spec.containers[0].env[?(@.name=="ED_DISABLE_LIVE_CAPTURE")].value}'

# Should return: 1

3. Optimize Self-Telemetry Cardinality

Resource Impact: ⭐⭐⭐ (Medium impact)

The Self Telemetry source generates metrics about agent health and pipeline statistics. In version 2.5.0, increased cardinality in self-telemetry metrics can impact resource usage.

Configuration:

If you notice high self-telemetry volume, consider:

  • Filtering or sampling self-telemetry metrics before forwarding
  • Aggregating metrics at higher intervals
  • Disabling specific metric types if not needed
nodes:
- name: ed_self_telemetry_input
  type: ed_self_telemetry_input
  enable_health_metrics: true
  enable_agent_stats_metrics: false  # Disable if not needed

Trade-offs:

  • ✅ Reduced metric cardinality and memory usage
  • ❌ Less granular visibility into agent performance
  • ❌ May impact troubleshooting capabilities

Pipeline-Level Optimizations

Beyond disabling features, optimize how your pipeline processes data:

4. Design Efficient Pipelines

Follow best practices from Designing Efficient Pipelines:

  • Reuse extracted values: Extract once with regex, reuse multiple times
  • Minimize regex operations: Most computationally expensive CEL macro
  • Filter early: Drop unwanted data before expensive transformations
  • Avoid overlapping conditions: Prevent duplicate processing

5. Follow Processor Best Practices

Apply recommendations from Processor Best Practices:

  • Use mutually exclusive conditions in multi-processor nodes
  • Avoid overloading single processor nodes (limit to 2-3 extract/aggregate chains)
  • Use name == conditions to tightly scope aggregate metrics
  • Implement effective sampling to reduce volume

6. Optimize Data Routing

  • Sample aggressively: Use Sample Processor early in pipeline
  • Filter unused telemetry: Remove logs/metrics you don’t need
  • Aggregate before forwarding: Reduce destination ingestion costs
  • Use consistent hashing: For gateway pipelines, ensures efficient routing

Kubernetes Resource Configuration

Set Appropriate Resource Limits

Configure Kubernetes resource requests and limits based on your workload. See Helm Values for full details.

Conservative (minimal features, low volume):

helm upgrade edgedelta edgedelta/edgedelta \
  --set resources.requests.cpu=100m \
  --set resources.requests.memory=256Mi \
  --set resources.limits.cpu=500m \
  --set resources.limits.memory=512Mi \
  -n edgedelta

Standard (typical production):

helm upgrade edgedelta edgedelta/edgedelta \
  --set resources.requests.cpu=200m \
  --set resources.requests.memory=512Mi \
  --set resources.limits.cpu=1000m \
  --set resources.limits.memory=2Gi \
  -n edgedelta

High-Volume (eBPF enabled, high throughput):

helm upgrade edgedelta edgedelta/edgedelta \
  --set resources.requests.cpu=500m \
  --set resources.requests.memory=1Gi \
  --set resources.limits.cpu=2000m \
  --set resources.limits.memory=4Gi \
  -n edgedelta

Go Memory Limit

Set GOMEMLIMIT to help the Go runtime manage memory more efficiently:

helm upgrade edgedelta edgedelta/edgedelta \
  --set env[0].name=GOMEMLIMIT \
  --set env[0].value=1800MiB \
  -n edgedelta

Note: If you’ve already configured other environment variables (like ED_DISABLE_LIVE_CAPTURE), adjust the array index accordingly (e.g., use env[1] if env[0] is already in use). Set to ~90% of your memory limit.

This helps prevent OOM kills by allowing the garbage collector to run more aggressively as memory approaches the limit.

Monitoring and Profiling

Identify Resource Bottlenecks

Use profiling with pprof to identify resource hotspots:

  1. Enable profiling in the Edge Delta UI for specific agents
  2. Look for spikes tied to:
    • Metric extraction and aggregation
    • eBPF-based sources (Service Map, K8s Trace)
    • High-cardinality metric generation
    • Frequent garbage collection cycles

Key Metrics to Monitor

Watch these self-telemetry metrics in the Metrics Explorer:

  • ed.agent.memory.usage: Memory consumption per agent
  • ed.agent.cpu.usage: CPU usage percentage
  • ed.agent.gc.duration: Garbage collection frequency and duration
  • ed.pipeline.*.throughput: Data volume per pipeline component

Signs of Resource Pressure

  • High GC frequency: Indicates memory pressure
  • CPU throttling: Agent hitting CPU limits
  • OOMKills: Memory limits too low or memory leak
  • Increasing memory over time: Potential memory leak or unbounded buffering

Decision Framework

Use this matrix to decide which optimizations to apply:

FeatureResource ImpactUse CaseDisable When…
eBPF Sources (k8s_trace, k8s_traffic)⭐⭐⭐⭐⭐ Very HighService mesh visibility, network monitoringUsing external APM, resource-constrained, no service map needed
Live Capture⭐⭐⭐⭐ High (15-20% in high-volume)Pipeline development, debuggingProduction with stable pipelines, high-volume (>50GB/day), compliance restrictions
Self-Telemetry Stats⭐⭐⭐ MediumAgent health monitoring, troubleshootingMinimal observability needs, external agent monitoring
Complex Processors⭐⭐⭐ MediumAdvanced transformations, enrichmentCan simplify logic, use downstream processing
High-Cardinality Metrics⭐⭐ Low-MediumDetailed analytics, fine-grained monitoringCan aggregate, acceptable to lose granularity

Example: Production Optimization

Here’s a complete Helm command for a resource-optimized production deployment:

helm upgrade edgedelta edgedelta/edgedelta -i \
  --version v1.17.0 \
  --set secretApiKey.value=<your-api-key> \
  --set tracerProps.enabled=false \
  --set env[0].name=ED_DISABLE_LIVE_CAPTURE \
  --set env[0].value="1" \
  --set env[1].name=GOMEMLIMIT \
  --set env[1].value=1800MiB \
  --set resources.requests.cpu=200m \
  --set resources.requests.memory=512Mi \
  --set resources.limits.cpu=1000m \
  --set resources.limits.memory=2Gi \
  -n edgedelta --create-namespace

This configuration:

  • Disables eBPF sources for major resource savings
  • Disables live capture for production stability
  • Sets Go memory limit for better GC behavior
  • Configures conservative resource limits

Expected Results:

  • CPU: ~0.2-0.3 vCPU per agent (vs 0.5+ with all features)
  • Memory: ~512MB-1GB per agent (vs 1.5-2GB with all features)
  • 40-60% overall resource reduction compared to default configuration

Troubleshooting

Agents Still Using High Resources

  1. Profile with pprof: Identify actual bottlenecks
  2. Check pipeline complexity: Review processor configuration
  3. Examine data volume: May need additional sampling/filtering
  4. Review destination health: Backpressure can cause buffering
  5. Check for memory leaks: Increasing memory over time indicates issues

Verification Commands

# Check current resource usage
kubectl top pods -n edgedelta

# View agent configuration
kubectl exec -n edgedelta <pod-name> -- cat /edgedelta/config.yml

# Check for OOMKills
kubectl get pods -n edgedelta -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].lastState.terminated.reason}{"\n"}{end}'

# View environment variables
kubectl describe pod -n edgedelta <pod-name> | grep -A 20 "Environment:"