Troubleshooting Traces
10 minute read
Overview
This guide provides comprehensive troubleshooting for Edge Delta trace collection and processing:
- Trace Search Issues - Child spans not appearing in search results
- Unknown Trace Types - Making unclassified traces searchable
- Gateway Aggregation - Using gateway pipelines for complete trace views
- Tail Sampling - Implementing trace-level sampling decisions
- eBPF vs OTEL Traces - Understanding trace source differences
For general information about using the Trace Explorer, see Edge Delta Trace Explorer.
Quick Diagnostic Checklist
Before diving into specific issues, verify:
- Agent version 1.24.0 or higher for trace support
- OTLP Source or Kubernetes Trace Source configured
- Edge Delta Traces destination enabled
- “Include Child Spans” checkbox enabled in Trace Explorer
- Traces visible in Trace Explorer details view
Trace Search Issues
Child Spans Not Appearing in Search Results
Symptoms
- Child spans are visible when viewing trace details
- Cannot find child spans using trace search queries
- Facets created from child span attributes return no results
- Error-free configuration but no search matches
Root Cause
The Trace Explorer automatically filters out traces with trace.type="Unknown"
from search results. This filter is applied implicitly (visible in browser network requests as trace.type != Unknown
). Even when the “Include Child Spans” checkbox is enabled, child spans belonging to Unknown-type traces will not appear in search results or facets.
This is a product design decision to focus search results on well-categorized traces.
Diagnosis
1. Check Trace Type in Details View
Open a trace in the Trace Explorer and examine the trace type field in the details view. If the trace type shows “Unknown”, any child spans within that trace will be filtered from search.
2. Verify Filter Behavior
Open your browser’s developer tools (Network tab) and observe the query sent to the backend. You’ll see trace.type != "Unknown"
added automatically to your search query.
3. Test with Known Trace Type
Create a search for child spans in traces with known types (HTTP, gRPC, etc.) to confirm search functionality works for properly typed traces.
Solutions
You have two options to make Unknown traces searchable: modify your application code or use pipeline transformations.
Option 1: Set Trace Type Attributes in Application Code
Edge Delta infers trace type by examining specific OpenTelemetry attributes. Add one or more of the recognized attributes when creating spans in your application code.
For HTTP Traces (most common):
// JavaScript/Node.js
const span = tracer.startSpan('convertCurlyTags', {
attributes: {
'http.flavor': '1.1', // HTTP protocol version
'http.method': 'POST', // HTTP method (optional)
'http.host': 'api.example.com', // Target host (optional)
// You don't need all attributes, just one or more
}
});
# Python
with tracer.start_as_current_span('process_data') as span:
span.set_attribute('http.flavor', '1.1')
span.set_attribute('http.method', 'GET')
# Perform work
// Java
Span span = tracer.spanBuilder("processRequest")
.setAttribute("http.flavor", "1.1")
.setAttribute("http.method", "POST")
.startSpan();
For gRPC Traces:
attributes: {
'rpc.system': 'grpc'
}
For Database Traces:
attributes: {
'db.system': 'postgresql' // or 'mysql', 'redis', 'mongodb', etc.
}
For Messaging Traces:
attributes: {
'messaging.system': 'kafka' // or 'rabbitmq', 'sqs', 'pubsub', etc.
}
Option 2: Use Pipeline Transformations
If you cannot modify application code, use a multiprocessor in your pipeline to set trace type attributes based on other span properties.
Example: Set HTTP attributes for specific scopes
processors:
- name: classify-unknown-traces
type: multiprocessor
config:
queries:
- query: |
set(attributes["http.flavor"], "1.1")
where attributes["ed.trace.type"] == "Unknown"
and attributes["otel.scope.name"] == "components-common-datautils"
Example: Set trace type for specific services
processors:
- name: classify-service-traces
type: multiprocessor
config:
queries:
- query: |
set(attributes["http.flavor"], "1.1")
where attributes["ed.trace.type"] == "Unknown"
and resource["service.name"] == "my-internal-service"
Example: Set database type for DB operations
processors:
- name: classify-db-traces
type: multiprocessor
config:
queries:
- query: |
set(attributes["db.system"], "postgresql")
where attributes["ed.trace.type"] == "Unknown"
and attributes["ed.span.resource"] matches ".*query.*"
Recognized Trace Type Attributes
Edge Delta recognizes the following attributes for automatic trace type inference:
Trace Type | Required Attributes (any one or more) |
---|---|
HTTP | http.protocol , http.flavor , http.method , http.host , http.url , http.status_code , http.response.status_code |
gRPC | rpc.system |
Database | db.system |
Messaging | messaging.system |
Traces without any of these attributes will be classified as “Unknown” and filtered from search results.
Verification
After implementing the solution:
- Check Live Tail to verify the trace type is set correctly on new spans
- Wait for data to flow through the pipeline (may take a few minutes)
- Create a facet from the child span attribute you want to search (see Add a Facet)
- Search using the newly created facet
Expected result: Child spans now appear in search results and facets work correctly.
Facets Not Populating from Child Spans
Symptoms
- Facet created from child span attribute
- Facet shows no values in dropdown
- Manual query using facet returns no results
Diagnosis
This is typically caused by the Unknown trace type issue described above. Facets only populate from searchable traces.
Solution
Follow the steps in “Child Spans Not Appearing in Search Results” to ensure your traces have recognized types.
Gateway Pipeline Trace Aggregation
When to Use Gateway Pipelines for Traces
Symptoms
- Incomplete traces in Trace Explorer
- Spans from the same trace appear in different agent logs
- Cannot apply trace-level sampling decisions
- Need service-level view across multiple clusters
Use Cases
Tail Sampling: In distributed systems where traces are collected from multiple node pipelines, spans belonging to the same parent trace can originate from different sources. Tail sampling requires seeing the entire trace before making a sampling decision. Deploy the Tail Sample Processor on a gateway pipeline to aggregate all spans and apply sampling logic to complete traces.
Cross-Cluster Aggregation: When the same service runs across multiple Kubernetes clusters or regions, gateway pipelines aggregate spans using consistent routing to ensure all spans for a given trace ID reach the same gateway instance.
Deduplication: Multiple sources may emit duplicate spans. Gateway pipelines deduplicate at the trace level.
Configuration
Step 1: Configure Node Pipeline to Send Traces to Gateway
nodes:
# Collect traces at the node level
- name: otlp_source
type: otlp_input
listen: 0.0.0.0:4318
protocol: grpc
# Send to gateway for aggregation
- name: send_to_gateway
type: ed_gateway_output
port: 443
protocol: grpc
endpoint_resolution_type: k8s
k8s_service_name: trace-gateway-svc
target_allocation_type: consistent
Important: Use target_allocation_type: consistent
to ensure all spans with the same Trace ID route to the same gateway instance. This is critical for tail sampling and trace completion.
Step 2: Configure Gateway Pipeline to Receive and Process Traces
nodes:
# Receive traces from node pipelines
- name: gateway_input
type: ed_pipeline_source
listen: 0.0.0.0:443
protocol: grpc
# Apply tail sampling to complete traces
- name: tail_sampling
type: tail_sample
decision_interval: 30s
sampling_policies:
- name: sample_errors
policy_type: status_code
status_codes:
- ERROR
- name: sample_slow_traces
policy_type: latency
lower_threshold: 1s
- name: sample_10_percent
policy_type: probabilistic
percentage: 10
# Send sampled traces to backend
- name: traces_destination
type: ed_traces_output
api_key: ${ED_API_KEY}
Step 3: Scale Gateway for High Volume
# In Helm values for gateway deployment
replicaCount: 3
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
Consistent Routing Behavior
When multiple node clusters send traces to the same gateway cluster using consistent
allocation:
- All sources sending spans for
trace.id:abc123
→ route to the same gateway pod - All sources sending spans for
trace.id:xyz789
→ route to the same gateway pod (likely different) - Ensures accurate cross-cluster trace aggregation and tail sampling
Example:
Node Cluster A: trace.id:abc123 → Gateway Pod X
Node Cluster B: trace.id:abc123 → Gateway Pod X (same pod)
Node Cluster C: trace.id:abc123 → Gateway Pod X (same pod)
Node Cluster A: trace.id:xyz789 → Gateway Pod Y
Node Cluster B: trace.id:xyz789 → Gateway Pod Y (same pod)
Verification
1. Check Gateway Receives Spans
Query gateway logs to verify span ingestion:
kubectl logs -n edgedelta deployment/trace-gateway | grep "received spans"
2. Verify Tail Sampling Decisions
kubectl logs -n edgedelta deployment/trace-gateway | grep "tail_sample"
3. Confirm Complete Traces in Explorer
Search for traces in the Trace Explorer and verify:
- All expected spans appear in the trace details
- Sampling decisions are applied correctly
- Trace timing and relationships are accurate
eBPF vs OpenTelemetry Traces
Differences and Troubleshooting
eBPF Traces
Characteristics:
- Captured at the kernel level
- No application code changes required
- Limited to Kubernetes environments with Edge Delta
- Provides network-level visibility (packet paths, syscalls)
- Requires
tracerProps.enabled=true
in Helm deployment
Common Issues:
Symptom: eBPF traces not appearing
Diagnosis:
# Check if tracer is enabled
kubectl get daemonset -n edgedelta -o yaml | grep -A 5 "tracerProps"
# Verify kernel privileges
kubectl get pods -n edgedelta -o jsonpath='{.items[*].spec.containers[*].securityContext}'
Solution: Ensure Helm values include:
tracerProps:
enabled: true
securityContext:
privileged: true
OpenTelemetry Traces
Characteristics:
- Application-level instrumentation
- Rich business context and custom attributes
- Works across any environment
- Requires OTLP Source node
Common Issues:
Symptom: OTEL traces not being received
Diagnosis:
# Check OTLP input is listening
kubectl get pods -n edgedelta -o wide
kubectl logs -n edgedelta <pod-name> | grep "otlp_input"
Solution: Verify application is configured to send to correct endpoint:
// Ensure app points to Edge Delta OTLP endpoint
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const exporter = new OTLPTraceExporter({
url: 'http://edgedelta-agent:4318',
});
Performance Issues
High Trace Volume Causing Backpressure
Symptoms
- Agent memory usage increasing
- Dropped spans warnings in logs
- Increased latency in application
Solutions
1. Implement Sampling at Node Level
processors:
- name: head_sampling
type: sample
sampling_rate: 0.1 # Sample 10% at collection
2. Use Tail Sampling at Gateway
Move sampling decisions to gateway for smarter filtering:
# Gateway pipeline
processors:
- name: intelligent_sampling
type: tail_sample
sampling_policies:
- name: always_sample_errors
policy_type: status_code
status_codes: [ERROR]
- name: sample_slow
policy_type: latency
lower_threshold: 500ms
- name: sample_rest
policy_type: probabilistic
percentage: 5
3. Increase Gateway Resources
resources:
requests:
memory: "4Gi"
cpu: "2000m"
limits:
memory: "8Gi"
cpu: "4000m"
4. Adjust Buffer Settings
nodes:
- name: gateway_output
type: ed_gateway_output
buffer_max_bytesize: 100MB
buffer_ttl: 15m
Advanced Debugging
Enable Trace Debug Logging
1. Edge Delta Agent Debug Mode
agent:
log_level: debug
2. Monitor Trace-Specific Logs
# Kubernetes
kubectl logs -n edgedelta daemonset/edgedelta | grep -i "trace\|span"
# Linux
tail -f /var/log/edgedelta/edgedelta.log | grep -i "trace\|span"
3. Filter for Specific Trace ID
kubectl logs -n edgedelta daemonset/edgedelta | grep "abc123"
Validate Trace Data Format
Check Raw Span Data:
Use Live Tail to inspect span attributes:
- Navigate to Telemetry → Live Tail
- Filter for data type: Trace
- Examine span attributes and resource attributes
- Verify expected fields are present
Test Trace Flow End-to-End
1. Generate Test Traces
# Use telemetry generator for testing
nodes:
- name: test_traces
type: telemetry_gen
telemetry_types:
- trace
rate: 10
2. Verify at Each Stage
- Check node pipeline logs for ingestion
- Check gateway pipeline logs for aggregation
- Check Trace Explorer for final visibility
Best Practices
Trace Collection
- Always set recognized trace type attributes in application code
- Use consistent trace ID formats across services
- Include service.name in resource attributes
- Set appropriate span kinds (INTERNAL, SERVER, CLIENT, etc.)
Gateway Aggregation
- Use consistent routing for trace aggregation
- Size gateway based on total trace volume, not source count
- Apply tail sampling only at gateway, not at nodes
- Monitor gateway memory for trace buffering
Search and Analysis
- Create facets from frequently searched attributes
- Avoid creating facets from high-cardinality fields
- Use saved queries for common trace searches
- Leverage correlation with logs using pod ID and timestamp
Performance
- Implement head sampling for extremely high volumes
- Use tail sampling for intelligent filtering
- Monitor buffer sizes and adjust as needed
- Scale gateway horizontally for increased throughput
Common Error Messages
Error Message | Likely Cause | Solution |
---|---|---|
“Trace type Unknown” | Missing recognized attributes | Add http.flavor, rpc.system, db.system, or messaging.system |
“No child spans found” | Trace type is Unknown | Set trace type attributes per solutions above |
“Spans from different sources” | No gateway aggregation | Configure gateway pipeline with consistent routing |
“Tail sampling incomplete trace” | Spans arriving at different gateways | Verify target_allocation_type: consistent |
“eBPF traces not visible” | Tracer not enabled or missing privileges | Set tracerProps.enabled=true and privileged: true |
“OTLP connection refused” | Wrong endpoint or port | Verify OTLP input configuration and service exposure |
Getting Help
If issues persist after following this guide:
Collect Diagnostic Information:
- Edge Delta configuration (sanitized)
- Sample trace ID that demonstrates the issue
- Agent and gateway logs
- Network topology (node → gateway → backend)
Check Agent Logs:
kubectl logs -n edgedelta daemonset/edgedelta --tail=1000 > agent-logs.txt kubectl logs -n edgedelta deployment/gateway --tail=1000 > gateway-logs.txt
Gather Trace Details:
- Screenshot from Trace Explorer showing the issue
- Sample span attributes from Live Tail
- Search query that’s not working as expected
Contact Support with:
- Diagnostic information from above
- Expected vs actual behavior
- Steps to reproduce
- Workarounds attempted