Reducing Agent Resource Consumption
7 minute read
Overview
Edge Delta agents are designed to be lightweight and efficient, but resource consumption can vary significantly based on enabled features, pipeline complexity, and data volume. This guide helps you optimize agent resource usage while understanding the trade-offs involved.
Typical Resource Consumption
Under normal operation with standard telemetry pipelines:
- CPU: 0.2-0.5 vCPU per agent (per node)
- Memory: 500MB-1GB per agent
- Pipeline Memory Multiplier: ~2.4x (e.g., 50GB/day data volume ≈ 120GB in pipeline memory)
These baseline metrics increase when additional features like eBPF-based sources or live capture are enabled, or when processing high-cardinality data.
When to Optimize
Consider optimizing agent resources when:
- Agents are the largest resource consumers in your stack
- You’re experiencing memory pressure or CPU throttling
- You need to reduce costs in large-scale deployments
- Regulatory or operational requirements limit resource allocation
- Agents are causing OOMKills or performance degradation
High-Impact Optimizations
1. Disable eBPF-Based Sources
Resource Impact: ⭐⭐⭐⭐⭐ (Highest impact - significant reduction)
The Kubernetes Trace source (k8s_trace_input
) and Kubernetes Service Map source (k8s_traffic_input
) use eBPF to capture network-level telemetry. While powerful, these features consume substantial CPU and memory resources.
Configuration:
Disable eBPF globally via Helm:
helm upgrade edgedelta edgedelta/edgedelta \
--set tracerProps.enabled=false \
-n edgedelta
Alternatively, remove the source nodes from your pipeline configuration if you only want to disable specific eBPF functionality while keeping other tracer features.
Trade-offs:
- ✅ Major reduction in CPU and memory usage
- ✅ Reduced GC pressure and improved stability
- ❌ Lose Service Map visualization
- ❌ Cannot capture eBPF-based traces
- ❌ No automatic network traffic monitoring
When to Disable:
- You don’t need service-to-service traffic visibility
- You’re using alternative APM/tracing solutions
- Resource constraints outweigh observability benefits
- Non-Kubernetes environments (eBPF sources only work in Kubernetes)
Verification:
# Check if tracer is disabled
kubectl get daemonset -n edgedelta -o yaml | grep -A 5 "tracerProps"
# Verify no eBPF sources in pipeline
kubectl exec -n edgedelta <pod-name> -- grep -E "k8s_trace_input|k8s_traffic_input" /edgedelta/config.yml
2. Disable Live Capture in Production
Resource Impact: ⭐⭐⭐⭐ (High impact - 15-20% reduction in high-volume environments)
Live Capture enables real-time pipeline debugging and data preview in the Edge Delta UI. While invaluable during development, it consumes resources by caching data items in memory and performing JSON marshaling operations.
Resource Cost:
- Memory: 15-20% overhead from in-memory caching of captured items
- CPU: JSON marshaling cost for serialization
- Volume Dependency: Impact scales with data volume processed by the agent
Configuration:
Set the environment variable via Helm:
helm upgrade edgedelta edgedelta/edgedelta \
--set env[0].name=ED_DISABLE_LIVE_CAPTURE \
--set env[0].value="1" \
-n edgedelta
Note: If combining with other environment variables (like GOMEMLIMIT), use sequential indices: env[0], env[1], env[2], etc. See the production optimization example below.
Or using a values file:
env:
- name: ED_DISABLE_LIVE_CAPTURE
value: "1"
Trade-offs:
- ✅ 15-20% reduction in CPU and memory in high-volume scenarios
- ✅ Reduced network egress to Edge Delta backend
- ✅ Eliminates real-time data sampling concerns
- ❌ Cannot use in-stream debugging features
- ❌ No live data preview when building processors
- ❌ Harder to troubleshoot pipeline behavior in production
When to Disable:
- Production environments with stable, tested pipelines
- High-volume environments (>50GB/day per agent)
- Security/compliance requirements prohibit real-time sampling
- Network policies restrict outbound data transmission
- Resource constraints are critical
When to Keep Enabled:
- Development and staging environments
- Actively building and testing new pipelines
- Troubleshooting data processing issues
- Need AI-powered processor recommendations
Verification:
# Check environment variable is set
kubectl get pods -n edgedelta -o jsonpath='{.items[0].spec.containers[0].env[?(@.name=="ED_DISABLE_LIVE_CAPTURE")].value}'
# Should return: 1
3. Optimize Self-Telemetry Cardinality
Resource Impact: ⭐⭐⭐ (Medium impact)
The Self Telemetry source generates metrics about agent health and pipeline statistics. In version 2.5.0, increased cardinality in self-telemetry metrics can impact resource usage.
Configuration:
If you notice high self-telemetry volume, consider:
- Filtering or sampling self-telemetry metrics before forwarding
- Aggregating metrics at higher intervals
- Disabling specific metric types if not needed
nodes:
- name: ed_self_telemetry_input
type: ed_self_telemetry_input
enable_health_metrics: true
enable_agent_stats_metrics: false # Disable if not needed
Trade-offs:
- ✅ Reduced metric cardinality and memory usage
- ❌ Less granular visibility into agent performance
- ❌ May impact troubleshooting capabilities
Pipeline-Level Optimizations
Beyond disabling features, optimize how your pipeline processes data:
4. Design Efficient Pipelines
Follow best practices from Designing Efficient Pipelines:
- Reuse extracted values: Extract once with regex, reuse multiple times
- Minimize regex operations: Most computationally expensive CEL macro
- Filter early: Drop unwanted data before expensive transformations
- Avoid overlapping conditions: Prevent duplicate processing
5. Follow Processor Best Practices
Apply recommendations from Processor Best Practices:
- Use mutually exclusive conditions in multi-processor nodes
- Avoid overloading single processor nodes (limit to 2-3 extract/aggregate chains)
- Use
name ==
conditions to tightly scope aggregate metrics - Implement effective sampling to reduce volume
6. Optimize Data Routing
- Sample aggressively: Use Sample Processor early in pipeline
- Filter unused telemetry: Remove logs/metrics you don’t need
- Aggregate before forwarding: Reduce destination ingestion costs
- Use consistent hashing: For gateway pipelines, ensures efficient routing
Kubernetes Resource Configuration
Set Appropriate Resource Limits
Configure Kubernetes resource requests and limits based on your workload. See Helm Values for full details.
Conservative (minimal features, low volume):
helm upgrade edgedelta edgedelta/edgedelta \
--set resources.requests.cpu=100m \
--set resources.requests.memory=256Mi \
--set resources.limits.cpu=500m \
--set resources.limits.memory=512Mi \
-n edgedelta
Standard (typical production):
helm upgrade edgedelta edgedelta/edgedelta \
--set resources.requests.cpu=200m \
--set resources.requests.memory=512Mi \
--set resources.limits.cpu=1000m \
--set resources.limits.memory=2Gi \
-n edgedelta
High-Volume (eBPF enabled, high throughput):
helm upgrade edgedelta edgedelta/edgedelta \
--set resources.requests.cpu=500m \
--set resources.requests.memory=1Gi \
--set resources.limits.cpu=2000m \
--set resources.limits.memory=4Gi \
-n edgedelta
Go Memory Limit
Set GOMEMLIMIT
to help the Go runtime manage memory more efficiently:
helm upgrade edgedelta edgedelta/edgedelta \
--set env[0].name=GOMEMLIMIT \
--set env[0].value=1800MiB \
-n edgedelta
Note: If you’ve already configured other environment variables (like ED_DISABLE_LIVE_CAPTURE), adjust the array index accordingly (e.g., use env[1] if env[0] is already in use). Set to ~90% of your memory limit.
This helps prevent OOM kills by allowing the garbage collector to run more aggressively as memory approaches the limit.
Monitoring and Profiling
Identify Resource Bottlenecks
Use profiling with pprof to identify resource hotspots:
- Enable profiling in the Edge Delta UI for specific agents
- Look for spikes tied to:
- Metric extraction and aggregation
- eBPF-based sources (Service Map, K8s Trace)
- High-cardinality metric generation
- Frequent garbage collection cycles
Key Metrics to Monitor
Watch these self-telemetry metrics in the Metrics Explorer:
ed.agent.memory.usage
: Memory consumption per agented.agent.cpu.usage
: CPU usage percentageed.agent.gc.duration
: Garbage collection frequency and durationed.pipeline.*.throughput
: Data volume per pipeline component
Signs of Resource Pressure
- High GC frequency: Indicates memory pressure
- CPU throttling: Agent hitting CPU limits
- OOMKills: Memory limits too low or memory leak
- Increasing memory over time: Potential memory leak or unbounded buffering
Decision Framework
Use this matrix to decide which optimizations to apply:
Feature | Resource Impact | Use Case | Disable When… |
---|---|---|---|
eBPF Sources (k8s_trace, k8s_traffic) | ⭐⭐⭐⭐⭐ Very High | Service mesh visibility, network monitoring | Using external APM, resource-constrained, no service map needed |
Live Capture | ⭐⭐⭐⭐ High (15-20% in high-volume) | Pipeline development, debugging | Production with stable pipelines, high-volume (>50GB/day), compliance restrictions |
Self-Telemetry Stats | ⭐⭐⭐ Medium | Agent health monitoring, troubleshooting | Minimal observability needs, external agent monitoring |
Complex Processors | ⭐⭐⭐ Medium | Advanced transformations, enrichment | Can simplify logic, use downstream processing |
High-Cardinality Metrics | ⭐⭐ Low-Medium | Detailed analytics, fine-grained monitoring | Can aggregate, acceptable to lose granularity |
Example: Production Optimization
Here’s a complete Helm command for a resource-optimized production deployment:
helm upgrade edgedelta edgedelta/edgedelta -i \
--version v1.17.0 \
--set secretApiKey.value=<your-api-key> \
--set tracerProps.enabled=false \
--set env[0].name=ED_DISABLE_LIVE_CAPTURE \
--set env[0].value="1" \
--set env[1].name=GOMEMLIMIT \
--set env[1].value=1800MiB \
--set resources.requests.cpu=200m \
--set resources.requests.memory=512Mi \
--set resources.limits.cpu=1000m \
--set resources.limits.memory=2Gi \
-n edgedelta --create-namespace
This configuration:
- Disables eBPF sources for major resource savings
- Disables live capture for production stability
- Sets Go memory limit for better GC behavior
- Configures conservative resource limits
Expected Results:
- CPU: ~0.2-0.3 vCPU per agent (vs 0.5+ with all features)
- Memory: ~512MB-1GB per agent (vs 1.5-2GB with all features)
- 40-60% overall resource reduction compared to default configuration
Troubleshooting
Agents Still Using High Resources
- Profile with pprof: Identify actual bottlenecks
- Check pipeline complexity: Review processor configuration
- Examine data volume: May need additional sampling/filtering
- Review destination health: Backpressure can cause buffering
- Check for memory leaks: Increasing memory over time indicates issues
Verification Commands
# Check current resource usage
kubectl top pods -n edgedelta
# View agent configuration
kubectl exec -n edgedelta <pod-name> -- cat /edgedelta/config.yml
# Check for OOMKills
kubectl get pods -n edgedelta -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].lastState.terminated.reason}{"\n"}{end}'
# View environment variables
kubectl describe pod -n edgedelta <pod-name> | grep -A 20 "Environment:"