Reducing Agent Resource Consumption
7 minute read
Overview
Edge Delta agents are designed to be lightweight and efficient, but resource consumption can vary significantly based on enabled features, pipeline complexity, and data volume. This guide helps you optimize agent resource usage while understanding the trade-offs involved.
Typical Resource Consumption
Under normal operation with standard telemetry pipelines:
- CPU: 0.2-0.5 vCPU per agent (per node)
- Memory: 500MB-1GB per agent
- Pipeline Memory Multiplier: ~2.4x (e.g., 50GB/day data volume ≈ 120GB in pipeline memory)
These baseline metrics increase when additional features like eBPF-based sources or live capture are enabled, or when processing high-cardinality data.
When to Optimize
Consider optimizing agent resources when:
- Agents are the largest resource consumers in your stack
- You’re experiencing memory pressure or CPU throttling
- You need to reduce costs in large-scale deployments
- Regulatory or operational requirements limit resource allocation
- Agents are causing OOMKills or performance degradation
High-Impact Optimizations
1. Disable eBPF-Based Sources (Kubernetes Only)
Resource Impact: Highest impact - significant reduction
Note: eBPF-based sources are only available in Kubernetes deployments. If you’re running Edge Delta on Linux virtual machines or bare metal, these sources are not applicable to your environment.
The Kubernetes Trace source (k8s_trace_input) and Kubernetes Service Map source (k8s_traffic_input) use eBPF to capture network-level telemetry. While powerful, these features consume substantial CPU and memory resources.
Configuration (Kubernetes only):
Disable eBPF globally via Helm:
helm upgrade edgedelta edgedelta/edgedelta \
--set tracerProps.enabled=false \
-n edgedelta
Alternatively, remove the source nodes from your pipeline configuration if you only want to disable specific eBPF functionality while keeping other tracer features.
Trade-offs:
- Major reduction in CPU and memory usage
- Reduced GC pressure and improved stability
- Lose Service Map visualization
- Cannot capture eBPF-based traces
- No automatic network traffic monitoring
When to Disable:
- You don’t need service-to-service traffic visibility
- You’re using alternative APM/tracing solutions
- Resource constraints outweigh observability benefits
- N/A for Linux environments - eBPF sources only work in Kubernetes
Verification:
# Check if tracer is disabled
kubectl get daemonset -n edgedelta -o yaml | grep -A 5 "tracerProps"
# Verify no eBPF sources in pipeline
kubectl exec -n edgedelta <pod-name> -- grep -E "k8s_trace_input|k8s_traffic_input" /edgedelta/config.yml
2. Disable Live Capture in Production
Resource Impact: High impact - 15-20% reduction in high-volume environments
Live Capture enables real-time pipeline debugging and data preview in the Edge Delta UI. While invaluable during development, it consumes resources by caching data items in memory and performing JSON marshaling operations.
Resource Cost:
- Memory: 15-20% overhead from in-memory caching of captured items
- CPU: JSON marshaling cost for serialization
- Volume Dependency: Impact scales with data volume processed by the agent
Configuration:
For Kubernetes deployments, set the environment variable via Helm:
helm upgrade edgedelta edgedelta/edgedelta \
--set env[0].name=ED_DISABLE_LIVE_CAPTURE \
--set env[0].value="1" \
-n edgedelta
Or using a values file:
env:
- name: ED_DISABLE_LIVE_CAPTURE
value: "1"
For Linux deployments, set the environment variable in your service configuration:
# For systemd services, add to /etc/systemd/system/edgedelta.service
Environment="ED_DISABLE_LIVE_CAPTURE=1"
# Or export before running the agent
export ED_DISABLE_LIVE_CAPTURE=1
Trade-offs:
- 15-20% reduction in CPU and memory in high-volume scenarios
- Reduced network egress to Edge Delta backend
- Eliminates real-time data sampling concerns
- Cannot use in-stream debugging features
- No live data preview when building processors
- Harder to troubleshoot pipeline behavior in production
When to Disable:
- Production environments with stable, tested pipelines
- High-volume environments (>50GB/day per agent)
- Security/compliance requirements prohibit real-time sampling
- Network policies restrict outbound data transmission
- Resource constraints are critical
When to Keep Enabled:
- Development and staging environments
- Actively building and testing new pipelines
- Troubleshooting data processing issues
- Need AI-powered processor recommendations
Verification:
# Check environment variable is set
kubectl get pods -n edgedelta -o jsonpath='{.items[0].spec.containers[0].env[?(@.name=="ED_DISABLE_LIVE_CAPTURE")].value}'
# Should return: 1
3. Optimize Self-Telemetry Cardinality
Resource Impact: Medium impact
The Self Telemetry source generates metrics about agent health and pipeline statistics. In version 2.5.0, increased cardinality in self-telemetry metrics can impact resource usage.
Configuration:
If you notice high self-telemetry volume, consider:
- Filtering or sampling self-telemetry metrics before forwarding
- Aggregating metrics at higher intervals
- Disabling specific metric types if not needed
nodes:
- name: ed_self_telemetry_input
type: ed_self_telemetry_input
enable_health_metrics: true
enable_agent_stats_metrics: false # Disable if not needed
Trade-offs:
- Reduced metric cardinality and memory usage
- Less granular visibility into agent performance
- May impact troubleshooting capabilities
Pipeline-Level Optimizations
Beyond disabling features, optimize how your pipeline processes data:
4. Design Efficient Pipelines
Follow best practices from Designing Efficient Pipelines:
- Reuse extracted values: Extract once with regex, reuse multiple times
- Minimize regex operations: Most computationally expensive CEL macro
- Filter early: Drop unwanted data before expensive transformations
- Avoid overlapping conditions: Prevent duplicate processing
5. Follow Processor Best Practices
Apply recommendations from Processor Best Practices:
- Use mutually exclusive conditions in multi-processor nodes
- Avoid overloading single processor nodes (limit to 2-3 extract/aggregate chains)
- Use
name ==conditions to tightly scope aggregate metrics - Implement effective sampling to reduce volume
6. Optimize Data Routing
- Sample aggressively: Use Sample Processor early in pipeline
- Filter unused telemetry: Remove logs/metrics you don’t need
- Aggregate before forwarding: Reduce destination ingestion costs
- Use consistent hashing: For gateway pipelines, ensures efficient routing
Kubernetes Resource Configuration
Set Appropriate Resource Limits
Configure Kubernetes resource requests and limits based on your workload. See Helm Values for full details.
Conservative (minimal features, low volume):
helm upgrade edgedelta edgedelta/edgedelta \
--set resources.requests.cpu=100m \
--set resources.requests.memory=256Mi \
--set resources.limits.cpu=500m \
--set resources.limits.memory=512Mi \
-n edgedelta
Standard (typical production):
helm upgrade edgedelta edgedelta/edgedelta \
--set resources.requests.cpu=200m \
--set resources.requests.memory=512Mi \
--set resources.limits.cpu=1000m \
--set resources.limits.memory=2Gi \
-n edgedelta
High-Volume (eBPF enabled, high throughput):
helm upgrade edgedelta edgedelta/edgedelta \
--set resources.requests.cpu=500m \
--set resources.requests.memory=1Gi \
--set resources.limits.cpu=2000m \
--set resources.limits.memory=4Gi \
-n edgedelta
Monitoring and Profiling
Identify Resource Bottlenecks
Use profiling with pprof to identify resource hotspots:
- Enable profiling in the Edge Delta UI for specific agents
- Look for spikes tied to:
- Metric extraction and aggregation
- eBPF-based sources (Service Map, K8s Trace)
- High-cardinality metric generation
- Frequent garbage collection cycles
Key Metrics to Monitor
Watch these self-telemetry metrics in the Metrics Explorer:
ed.agent.memory.usage: Memory consumption per agented.agent.cpu.usage: CPU usage percentageed.agent.gc.duration: Garbage collection frequency and durationed.pipeline.*.throughput: Data volume per pipeline component
Signs of Resource Pressure
- High GC frequency: Indicates memory pressure
- CPU throttling: Agent hitting CPU limits
- OOMKills: Memory limits too low or memory leak
- Increasing memory over time: Potential memory leak or unbounded buffering
Decision Framework
Use this matrix to decide which optimizations to apply:
| Feature | Resource Impact | Use Case | Disable When… |
|---|---|---|---|
| eBPF Sources (k8s_trace, k8s_traffic) | Very High | Service mesh visibility, network monitoring | Using external APM, resource-constrained, no service map needed |
| Live Capture | High (15-20% in high-volume) | Pipeline development, debugging | Production with stable pipelines, high-volume (>50GB/day), compliance restrictions |
| Self-Telemetry Stats | Medium | Agent health monitoring, troubleshooting | Minimal observability needs, external agent monitoring |
| Complex Processors | Medium | Advanced transformations, enrichment | Can simplify logic, use downstream processing |
| High-Cardinality Metrics | Low-Medium | Detailed analytics, fine-grained monitoring | Can aggregate, acceptable to lose granularity |
Example: Production Optimization
Kubernetes Deployment
Here’s a complete Helm command for a resource-optimized production deployment:
helm upgrade edgedelta edgedelta/edgedelta -i \
--version v1.17.0 \
--set secretApiKey.value=<your-api-key> \
--set tracerProps.enabled=false \
--set env[0].name=ED_DISABLE_LIVE_CAPTURE \
--set env[0].value="1" \
--set resources.requests.cpu=200m \
--set resources.requests.memory=512Mi \
--set resources.limits.cpu=1000m \
--set resources.limits.memory=2Gi \
-n edgedelta --create-namespace
This configuration:
- Disables eBPF sources for major resource savings
- Disables live capture for production stability
- Configures conservative resource limits
Linux Deployment
For Linux environments (VM or bare metal), configure the agent with minimal resource consumption:
1. Set environment variables in your service configuration (/etc/systemd/system/edgedelta.service):
[Service]
Environment="ED_DISABLE_LIVE_CAPTURE=1"
2. Ensure your pipeline configuration avoids resource-intensive features:
- Remove any
k8s_trace_inputork8s_traffic_inputsources (these only work in Kubernetes) - Optimize processor chains as described in Pipeline-Level Optimizations
- Apply aggressive sampling if processing high data volumes
3. Monitor resource usage using system tools:
# Check CPU and memory usage
ps aux | grep edgedelta
# View agent logs
journalctl -u edgedelta -f
Expected Results
With these optimizations applied:
- CPU: ~0.2-0.3 vCPU per agent (vs 0.5+ with all features)
- Memory: ~512MB-1GB per agent (vs 1.5-2GB with all features)
- Overall reduction: 40-60% compared to default configuration with all features enabled
Troubleshooting
Agents Still Using High Resources
- Profile with pprof: Identify actual bottlenecks
- Check pipeline complexity: Review processor configuration
- Examine data volume: May need additional sampling/filtering
- Review destination health: Backpressure can cause buffering
- Check for memory leaks: Increasing memory over time indicates issues
Verification Commands
# Check current resource usage
kubectl top pods -n edgedelta
# View agent configuration
kubectl exec -n edgedelta <pod-name> -- cat /edgedelta/config.yml
# Check for OOMKills
kubectl get pods -n edgedelta -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].lastState.terminated.reason}{"\n"}{end}'
# View environment variables
kubectl describe pod -n edgedelta <pod-name> | grep -A 20 "Environment:"