Troubleshooting Traces

Comprehensive troubleshooting guide for Edge Delta trace collection, processing, and search issues including Unknown trace types and gateway aggregation.

10 minute read

Overview

This guide provides comprehensive troubleshooting for Edge Delta trace collection and processing:

Trace Search Issues - Child spans not appearing in search results
Unknown Trace Types - Making unclassified traces searchable
Gateway Aggregation - Using gateway pipelines for complete trace views
Tail Sampling - Implementing trace-level sampling decisions
eBPF vs OTEL Traces - Understanding trace source differences

For general information about using the Trace Explorer, see Edge Delta Trace Explorer.

Quick Diagnostic Checklist

Before diving into specific issues, verify:

Agent version 1.24.0 or higher for trace support
OTLP Source or Kubernetes Trace Source configured
Edge Delta Traces destination enabled
“Include Child Spans” checkbox enabled in Trace Explorer
Traces visible in Trace Explorer details view

Trace Search Issues

Child Spans Not Appearing in Search Results

Symptoms

Child spans are visible when viewing trace details
Cannot find child spans using trace search queries
Facets created from child span attributes return no results
Error-free configuration but no search matches

Root Cause

The Trace Explorer automatically filters out traces with trace.type="Unknown" from search results. This filter is applied implicitly (visible in browser network requests as trace.type != Unknown). Even when the “Include Child Spans” checkbox is enabled, child spans belonging to Unknown-type traces will not appear in search results or facets.

This is a product design decision to focus search results on well-categorized traces.

Diagnosis

1. Check Trace Type in Details View

Open a trace in the Trace Explorer and examine the trace type field in the details view. If the trace type shows “Unknown”, any child spans within that trace will be filtered from search.

2. Verify Filter Behavior

Open your browser’s developer tools (Network tab) and observe the query sent to the backend. You’ll see trace.type != "Unknown" added automatically to your search query.

3. Test with Known Trace Type

Create a search for child spans in traces with known types (HTTP, gRPC, etc.) to confirm search functionality works for properly typed traces.

Solutions

You have two options to make Unknown traces searchable: modify your application code or use pipeline transformations.

Option 1: Set Trace Type Attributes in Application Code

Edge Delta infers trace type by examining specific OpenTelemetry attributes. Add one or more of the recognized attributes when creating spans in your application code.

For HTTP Traces (most common):

// JavaScript/Node.js
const span = tracer.startSpan('convertCurlyTags', {
  attributes: {
    'http.flavor': '1.1',        // HTTP protocol version
    'http.method': 'POST',       // HTTP method (optional)
    'http.host': 'api.example.com',  // Target host (optional)
    // You don't need all attributes, just one or more
  }
});

# Python
with tracer.start_as_current_span('process_data') as span:
    span.set_attribute('http.flavor', '1.1')
    span.set_attribute('http.method', 'GET')
    # Perform work

// Java
Span span = tracer.spanBuilder("processRequest")
    .setAttribute("http.flavor", "1.1")
    .setAttribute("http.method", "POST")
    .startSpan();

For gRPC Traces:

attributes: {
  'rpc.system': 'grpc'
}

For Database Traces:

attributes: {
  'db.system': 'postgresql'  // or 'mysql', 'redis', 'mongodb', etc.
}

For Messaging Traces:

attributes: {
  'messaging.system': 'kafka'  // or 'rabbitmq', 'sqs', 'pubsub', etc.
}

Option 2: Use Pipeline Transformations

If you cannot modify application code, use a multiprocessor in your pipeline to set trace type attributes based on other span properties.

Example: Set HTTP attributes for specific scopes

processors:
  - name: classify-unknown-traces
    type: multiprocessor
    config:
      queries:
        - query: |
            set(attributes["http.flavor"], "1.1")
            where attributes["ed.trace.type"] == "Unknown"
            and attributes["otel.scope.name"] == "components-common-datautils"

Example: Set trace type for specific services

processors:
  - name: classify-service-traces
    type: multiprocessor
    config:
      queries:
        - query: |
            set(attributes["http.flavor"], "1.1")
            where attributes["ed.trace.type"] == "Unknown"
            and resource["service.name"] == "my-internal-service"

Example: Set database type for DB operations

processors:
  - name: classify-db-traces
    type: multiprocessor
    config:
      queries:
        - query: |
            set(attributes["db.system"], "postgresql")
            where attributes["ed.trace.type"] == "Unknown"
            and attributes["ed.span.resource"] matches ".*query.*"

Recognized Trace Type Attributes

Edge Delta recognizes the following attributes for automatic trace type inference:

Trace Type	Required Attributes (any one or more)
HTTP	`http.protocol`, `http.flavor`, `http.method`, `http.host`, `http.url`, `http.status_code`, `http.response.status_code`
gRPC	`rpc.system`
Database	`db.system`
Messaging	`messaging.system`

Traces without any of these attributes will be classified as “Unknown” and filtered from search results.

Verification

After implementing the solution:

Check Live Tail to verify the trace type is set correctly on new spans
Wait for data to flow through the pipeline (may take a few minutes)
Create a facet from the child span attribute you want to search (see Add a Facet)
Search using the newly created facet

Expected result: Child spans now appear in search results and facets work correctly.

Symptoms

Facet created from child span attribute
Facet shows no values in dropdown
Manual query using facet returns no results

Diagnosis

This is typically caused by the Unknown trace type issue described above. Facets only populate from searchable traces.

Solution

Follow the steps in “Child Spans Not Appearing in Search Results” to ensure your traces have recognized types.

Gateway Pipeline Trace Aggregation

When to Use Gateway Pipelines for Traces

Symptoms

Incomplete traces in Trace Explorer
Spans from the same trace appear in different agent logs
Cannot apply trace-level sampling decisions
Need service-level view across multiple clusters

Use Cases

Tail Sampling: In distributed systems where traces are collected from multiple node pipelines, spans belonging to the same parent trace can originate from different sources. Tail sampling requires seeing the entire trace before making a sampling decision. Deploy the Tail Sample Processor on a gateway pipeline to aggregate all spans and apply sampling logic to complete traces.

Cross-Cluster Aggregation: When the same service runs across multiple Kubernetes clusters or regions, gateway pipelines aggregate spans using consistent routing to ensure all spans for a given trace ID reach the same gateway instance.

Deduplication: Multiple sources may emit duplicate spans. Gateway pipelines deduplicate at the trace level.

Configuration

Step 1: Configure Node Pipeline to Send Traces to Gateway

nodes:
  # Collect traces at the node level
  - name: otlp_source
    type: otlp_input
    listen: 0.0.0.0:4318
    protocol: grpc

  # Send to gateway for aggregation
  - name: send_to_gateway
    type: ed_gateway_output
    port: 443
    protocol: grpc
    endpoint_resolution_type: k8s
    k8s_service_name: trace-gateway-svc
    target_allocation_type: consistent

Important: Use target_allocation_type: consistent to ensure all spans with the same Trace ID route to the same gateway instance. This is critical for tail sampling and trace completion.

Step 2: Configure Gateway Pipeline to Receive and Process Traces

nodes:
  # Receive traces from node pipelines
  - name: gateway_input
    type: ed_pipeline_source
    listen: 0.0.0.0:443
    protocol: grpc

  # Apply tail sampling to complete traces
  - name: tail_sampling
    type: tail_sample
    decision_interval: 30s
    sampling_policies:
      - name: sample_errors
        policy_type: status_code
        status_codes:
          - ERROR
      - name: sample_slow_traces
        policy_type: latency
        lower_threshold: 1s
      - name: sample_10_percent
        policy_type: probabilistic
        percentage: 10

  # Send sampled traces to backend
  - name: traces_destination
    type: ed_traces_output
    api_key: ${ED_API_KEY}

Step 3: Scale Gateway for High Volume

# In Helm values for gateway deployment
replicaCount: 3

resources:
  requests:
    memory: "2Gi"
    cpu: "1000m"
  limits:
    memory: "4Gi"
    cpu: "2000m"

Consistent Routing Behavior

When multiple node clusters send traces to the same gateway cluster using consistent allocation:

All sources sending spans for trace.id:abc123 → route to the same gateway pod
All sources sending spans for trace.id:xyz789 → route to the same gateway pod (likely different)
Ensures accurate cross-cluster trace aggregation and tail sampling

Example:

Node Cluster A: trace.id:abc123 → Gateway Pod X
Node Cluster B: trace.id:abc123 → Gateway Pod X (same pod)
Node Cluster C: trace.id:abc123 → Gateway Pod X (same pod)

Node Cluster A: trace.id:xyz789 → Gateway Pod Y
Node Cluster B: trace.id:xyz789 → Gateway Pod Y (same pod)

Verification

1. Check Gateway Receives Spans

Query gateway logs to verify span ingestion:

kubectl logs -n edgedelta deployment/trace-gateway | grep "received spans"

2. Verify Tail Sampling Decisions

kubectl logs -n edgedelta deployment/trace-gateway | grep "tail_sample"

3. Confirm Complete Traces in Explorer

Search for traces in the Trace Explorer and verify:

All expected spans appear in the trace details
Sampling decisions are applied correctly
Trace timing and relationships are accurate

eBPF vs OpenTelemetry Traces

Differences and Troubleshooting

eBPF Traces

Characteristics:

Captured at the kernel level
No application code changes required
Limited to Kubernetes environments with Edge Delta
Provides network-level visibility (packet paths, syscalls)
Requires tracerProps.enabled=true in Helm deployment

Common Issues:

Symptom: eBPF traces not appearing

Diagnosis:

# Check if tracer is enabled
kubectl get daemonset -n edgedelta -o yaml | grep -A 5 "tracerProps"

# Verify kernel privileges
kubectl get pods -n edgedelta -o jsonpath='{.items[*].spec.containers[*].securityContext}'

Solution: Ensure Helm values include:

tracerProps:
  enabled: true

securityContext:
  privileged: true

OpenTelemetry Traces

Characteristics:

Application-level instrumentation
Rich business context and custom attributes
Works across any environment
Requires OTLP Source node

Common Issues:

Symptom: OTEL traces not being received

Diagnosis:

# Check OTLP input is listening
kubectl get pods -n edgedelta -o wide
kubectl logs -n edgedelta <pod-name> | grep "otlp_input"

Solution: Verify application is configured to send to correct endpoint:

// Ensure app points to Edge Delta OTLP endpoint
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');

const exporter = new OTLPTraceExporter({
  url: 'http://edgedelta-agent:4318',
});

Performance Issues

High Trace Volume Causing Backpressure

Symptoms

Agent memory usage increasing
Dropped spans warnings in logs
Increased latency in application

Solutions

1. Implement Sampling at Node Level

processors:
  - name: head_sampling
    type: sample
    sampling_rate: 0.1  # Sample 10% at collection

2. Use Tail Sampling at Gateway

Move sampling decisions to gateway for smarter filtering:

# Gateway pipeline
processors:
  - name: intelligent_sampling
    type: tail_sample
    sampling_policies:
      - name: always_sample_errors
        policy_type: status_code
        status_codes: [ERROR]
      - name: sample_slow
        policy_type: latency
        lower_threshold: 500ms
      - name: sample_rest
        policy_type: probabilistic
        percentage: 5

3. Increase Gateway Resources

resources:
  requests:
    memory: "4Gi"
    cpu: "2000m"
  limits:
    memory: "8Gi"
    cpu: "4000m"

4. Adjust Buffer Settings

nodes:
  - name: gateway_output
    type: ed_gateway_output
    buffer_max_bytesize: 100MB
    buffer_ttl: 15m

Advanced Debugging

Enable Trace Debug Logging

1. Edge Delta Agent Debug Mode

agent:
  log_level: debug

2. Monitor Trace-Specific Logs

# Kubernetes
kubectl logs -n edgedelta daemonset/edgedelta | grep -i "trace\|span"

# Linux
tail -f /var/log/edgedelta/edgedelta.log | grep -i "trace\|span"

3. Filter for Specific Trace ID

kubectl logs -n edgedelta daemonset/edgedelta | grep "abc123"

Validate Trace Data Format

Check Raw Span Data:

Use Live Tail to inspect span attributes:

Navigate to Telemetry → Live Tail
Filter for data type: Trace
Examine span attributes and resource attributes
Verify expected fields are present

Test Trace Flow End-to-End

1. Generate Test Traces

# Use telemetry generator for testing
nodes:
  - name: test_traces
    type: telemetry_gen
    telemetry_types:
      - trace
    rate: 10

2. Verify at Each Stage

Check node pipeline logs for ingestion
Check gateway pipeline logs for aggregation
Check Trace Explorer for final visibility

Best Practices

Trace Collection

Always set recognized trace type attributes in application code
Use consistent trace ID formats across services
Include service.name in resource attributes
Set appropriate span kinds (INTERNAL, SERVER, CLIENT, etc.)

Gateway Aggregation

Use consistent routing for trace aggregation
Size gateway based on total trace volume, not source count
Apply tail sampling only at gateway, not at nodes
Monitor gateway memory for trace buffering

Search and Analysis

Create facets from frequently searched attributes
Avoid creating facets from high-cardinality fields
Use saved queries for common trace searches
Leverage correlation with logs using pod ID and timestamp

Performance

Implement head sampling for extremely high volumes
Use tail sampling for intelligent filtering
Monitor buffer sizes and adjust as needed
Scale gateway horizontally for increased throughput

Common Error Messages

Error Message	Likely Cause	Solution
“Trace type Unknown”	Missing recognized attributes	Add http.flavor, rpc.system, db.system, or messaging.system
“No child spans found”	Trace type is Unknown	Set trace type attributes per solutions above
“Spans from different sources”	No gateway aggregation	Configure gateway pipeline with consistent routing
“Tail sampling incomplete trace”	Spans arriving at different gateways	Verify target_allocation_type: consistent
“eBPF traces not visible”	Tracer not enabled or missing privileges	Set tracerProps.enabled=true and privileged: true
“OTLP connection refused”	Wrong endpoint or port	Verify OTLP input configuration and service exposure

Getting Help

If issues persist after following this guide:

Collect Diagnostic Information:
- Edge Delta configuration (sanitized)
- Sample trace ID that demonstrates the issue
- Agent and gateway logs
- Network topology (node → gateway → backend)

Check Agent Logs:

kubectl logs -n edgedelta daemonset/edgedelta --tail=1000 > agent-logs.txt
kubectl logs -n edgedelta deployment/gateway --tail=1000 > gateway-logs.txt

Gather Trace Details:
- Screenshot from Trace Explorer showing the issue
- Sample span attributes from Live Tail
- Search query that’s not working as expected
Contact Support with:
- Diagnostic information from above
- Expected vs actual behavior
- Steps to reproduce
- Workarounds attempted

Troubleshooting Traces

Overview

Quick Diagnostic Checklist

Trace Search Issues

Child Spans Not Appearing in Search Results

Symptoms

Root Cause

Diagnosis

Solutions

Option 1: Set Trace Type Attributes in Application Code

Option 2: Use Pipeline Transformations

Recognized Trace Type Attributes

Verification

Facets Not Populating from Child Spans

Symptoms

Diagnosis

Solution

Gateway Pipeline Trace Aggregation

When to Use Gateway Pipelines for Traces

Symptoms

Use Cases

Configuration

Step 1: Configure Node Pipeline to Send Traces to Gateway

Step 2: Configure Gateway Pipeline to Receive and Process Traces

Step 3: Scale Gateway for High Volume

Consistent Routing Behavior

Verification

eBPF vs OpenTelemetry Traces

Differences and Troubleshooting

eBPF Traces

OpenTelemetry Traces

Performance Issues

High Trace Volume Causing Backpressure

Symptoms

Solutions

Advanced Debugging

Enable Trace Debug Logging

Validate Trace Data Format

Test Trace Flow End-to-End

Best Practices

Trace Collection

Gateway Aggregation

Search and Analysis

Performance

Common Error Messages

Getting Help

Related Documentation

Edge Delta AI Assistant

Quick Topics

Recent Questions

Hi! I'm your Edge Delta AI Assistant

Current Context