eBPF vs OpenTelemetry Tracing

Compare eBPF and OpenTelemetry tracing approaches in Edge Delta.

Overview

Edge Delta supports both eBPF and OpenTelemetry (OTEL) tracing to provide comprehensive visibility into your systems. Each approach has distinct capabilities and use cases. This guide helps you understand the differences and choose the right approach for your needs.

Technology Comparison

eBPF Tracing

eBPF (Extended Berkeley Packet Filter) is a Linux kernel technology that captures system-level events without modifying application code. Edge Delta’s Kubernetes agent includes built-in eBPF capabilities for supported environments.

eBPF excels at capturing network packet flows between services and monitoring file system access patterns. It provides visibility into process execution, system calls, and kernel function calls, making it ideal for understanding service-to-service communication at the IP level. Since eBPF operates at the kernel level, it requires no application instrumentation and has minimal performance impact.

OpenTelemetry Tracing

OpenTelemetry provides application-level visibility through code instrumentation. It requires installing SDKs or agents in your applications to generate detailed spans for specific operations.

OTEL captures comprehensive application behavior including HTTP request flows with full headers and payloads, database query details with latency measurements, and function execution times. The technology excels at creating business logic spans with custom attributes that provide context specific to your application. Through distributed trace context with unique trace IDs, OpenTelemetry enables cross-service correlation using parent-child relationships between spans, making it invaluable for debugging complex distributed transactions.

Capability Matrix

Capability eBPF OpenTelemetry
Setup Requirements No code changes SDK/agent instrumentation
Linux Kernel Support Required (recent kernel) Not required
Service-level monitoring
Network traffic visibility Limited
Database query details Limited
HTTP request details Basic ✓ (full context)
Business logic spans
Method/function tracing Limited
Distributed trace IDs
External system correlation Basic
File system operations
Process execution
Kernel events
Language runtime visibility Limited (JVM, .NET)

When to Use Each Approach

Use eBPF Tracing When:

eBPF tracing is optimal when you cannot modify application code due to legacy systems or change control restrictions. It provides immediate visibility without deployment changes, making it ideal for investigating security events or system resource issues. Choose eBPF when you need system-level visibility into Linux kernel operations or want to monitor network communication patterns between services. This approach works best when your primary focus is understanding infrastructure and networking behavior rather than application logic.

Use OpenTelemetry When:

OpenTelemetry is the right choice when you need detailed application performance monitoring (APM) capabilities. It enables tracing specific business transactions end-to-end and correlating events across distributed services. Choose OTEL when debugging application logic or performance issues that require understanding the internal workings of your code. The ability to add custom span attributes and business context makes it invaluable for tracking user journeys through multiple services and understanding application-specific behavior.

Use Both Together When:

Combining eBPF and OpenTelemetry provides complete visibility from kernel to application layer. This dual approach is valuable when correlating infrastructure issues with application behavior or validating that application traces align with actual network traffic. Production systems benefit from this comprehensive observability strategy, as eBPF provides the infrastructure context while OTEL delivers the application details necessary for full-stack troubleshooting.

Configuration in Edge Delta

Configuring eBPF Tracing

eBPF tracing is available through the Kubernetes Trace source node on supported Linux systems.

  1. Deploy Edge Delta agent on Kubernetes with a recent Linux kernel
  2. Add the Kubernetes Trace source to your pipeline:
nodes:
  - name: k8s_trace_input
    type: k8s_trace_input
    include:
      - k8s.namespace.name=production
    exclude:
      - k8s.namespace.name=kube-system
  1. View service-level traces in the Edge Delta Trace Explorer
  2. Use the Service Map to visualize service dependencies

Configuring OpenTelemetry Tracing

Edge Delta can receive OTEL traces through the OTLP source node from any OpenTelemetry-instrumented application.

Option 1: Edge Delta as OTEL Collector

  1. Install Edge Delta agent (on-premise or cloud)
  2. Configure OTLP input node:
nodes:
  - name: otlp_input
    type: otlp_input
    port: 4317
    protocol: grpc
  1. Configure your application’s OTEL SDK to send to Edge Delta:
# Java example
java -javaagent:/path/to/opentelemetry-javaagent.jar \
     -Dotel.exporter.otlp.endpoint=http://edge-delta-agent:4317 \
     -Dotel.exporter.otlp.protocol=grpc \
     -Dotel.service.name=my-service \
     -jar myapp.jar

Option 2: Forward from OTEL Collector

  1. Deploy standard OTEL Collector in your environment
  2. Configure it to forward to Edge Delta Cloud Pipeline:
exporters:
  otlp/edge_delta:
    endpoint: 'your-pipeline-id-grpc-us-west2-cf.aws.edgedelta.com:443'
    tls:
      insecure: false

Trace Data Differences

eBPF Trace Example

eBPF traces reveal system-level interactions including source and destination IPs, port numbers, and protocol information. These traces provide latency measurements and HTTP status codes when available, painting a clear picture of service-to-service communication patterns at the network level. The data focuses on what happened between services rather than within them.

OTEL Trace Example

OpenTelemetry traces provide rich application-level details through span hierarchies that show parent-child relationships between operations. Each span can include custom attributes and tags, detailed timing for operations, and complete error messages with stack traces. The traces capture business context and user information along with specific database queries and their parameters, enabling deep application debugging and performance analysis.

Performance Considerations

eBPF Performance Impact

eBPF operates with minimal overhead, typically consuming less than 1% CPU with no application memory overhead. Since it runs at the kernel level, filtering happens before data reaches user space, significantly reducing data volume. The technology has no impact on application garbage collection and requires no application restarts for deployment or updates, making it ideal for production environments where stability is critical.

OpenTelemetry Performance Impact

OpenTelemetry introduces variable overhead depending on instrumentation level, typically ranging from 1-5% CPU usage. The SDK requires memory for span buffering and can create garbage collection pressure in managed runtimes like Java or .NET. Network overhead from trace export adds to the overall impact. Agent updates may require application restarts, which should be factored into deployment planning for production systems.

Data Volume and Sampling

Both approaches can generate significant data volumes. Edge Delta provides sampling controls:

eBPF Sampling

Configure namespace-based filtering to reduce volume:

nodes:
  - name: k8s_trace_input
    type: k8s_trace_input
    include:
      - k8s.namespace.name=critical-services

OTEL Sampling

Use Edge Delta’s pipeline processors for intelligent sampling:

  • Head-based sampling at collection
  • Tail-based sampling after processing
  • Dynamic sampling based on error rates
  • Adaptive sampling during high load

Integration with Edge Delta Features

Both tracing approaches integrate with Edge Delta’s Telemetry Pipeline capabilities to enhance observability and control costs.

Pattern Detection

Use the Log to Pattern processor to identify anomalies in trace patterns. Configure the processor to analyze trace attributes like latency, error rates, or service communication patterns. When unusual patterns emerge, the processor groups similar traces and highlights deviations from normal behavior, enabling rapid anomaly detection across your distributed system.

Metric Extraction

Convert trace data into metrics using the Extract Metric processor. Extract key performance indicators from traces such as request rates, error counts, or latency percentiles. For example, generate a metric from trace duration attributes to monitor service response times:

- type: extract_metric
  extract_metric_rules:
  - name: trace_duration_ms
    description: Service trace duration in milliseconds
    unit: "ms"
    gauge:
      value: attributes["duration_ms"]

Correlation

Link traces with logs and metrics using the Correlate Logs and Traces capabilities. Add trace IDs to log entries using field enrichment, then use common attributes like service name or request ID to connect telemetry types. This correlation enables you to jump from a slow trace to related error logs or resource metrics, providing full context for troubleshooting.

Routing

Configure destination nodes to send different trace types to appropriate backends. Use conditional routing based on trace attributes to direct eBPF traces to security tools while sending OTEL traces to APM platforms. Route high-value traces to long-term storage while sampling or dropping routine traffic:

- name: route_traces
  type: route
  routes:
  - condition: attributes["trace.type"] == "ebpf"
    destination: security_siem
  - condition: attributes["error"] == true
    destination: apm_platform

Enrichment

Add context using the Lookup processor with lookup tables containing service metadata, team ownership, or environment information. Match trace attributes against your lookup tables to enrich traces with business context, cost center codes, or compliance tags. This enrichment helps with troubleshooting, reporting, and cost allocation.

Aggregation

Reduce data volume and costs using the Rollup Metric processor to summarize trace metrics over time windows. Instead of storing every trace, aggregate key metrics like request counts, error rates, and latency percentiles per minute or hour. This approach maintains visibility while dramatically reducing storage and ingestion costs for high-volume tracing data.

Troubleshooting Common Issues

eBPF Tracing Issues

  • No traces appearing: Verify Linux kernel version supports eBPF (4.14+)
  • Missing services: Check namespace inclusion/exclusion rules
  • Limited visibility into JVM: eBPF has limited insight into managed runtimes

OTEL Tracing Issues

  • No traces received: Verify OTLP endpoint configuration and network connectivity
  • Missing spans: Check instrumentation coverage and sampling configuration
  • High overhead: Reduce instrumentation scope or adjust sampling rates

Best Practices

Start with eBPF for immediate visibility without code changes, especially in production environments where you need quick insights. Once you’ve identified critical business services requiring detailed APM, add OTEL instrumentation to those specific components. Using both technologies together provides comprehensive production observability.

Configure sampling appropriately to manage costs and performance impact. Edge Delta’s pipeline capabilities help normalize and route both trace types to appropriate destinations. Continuously monitor the overhead of your tracing strategy and adjust collection parameters as your system evolves and scales.