Troubleshooting Traces

Comprehensive troubleshooting guide for Edge Delta trace collection, processing, and search issues including Unknown trace types and gateway aggregation.

Overview

This guide provides comprehensive troubleshooting for Edge Delta trace collection and processing:

  • Trace Search Issues - Child spans not appearing in search results
  • Unknown Trace Types - Making unclassified traces searchable
  • Gateway Aggregation - Using gateway pipelines for complete trace views
  • Tail Sampling - Implementing trace-level sampling decisions
  • eBPF vs OTEL Traces - Understanding trace source differences

For general information about using the Trace Explorer, see Edge Delta Trace Explorer.

Quick Diagnostic Checklist

Before diving into specific issues, verify:

  • Agent version 1.24.0 or higher for trace support
  • OTLP Source or Kubernetes Trace Source configured
  • Edge Delta Traces destination enabled
  • “Include Child Spans” checkbox enabled in Trace Explorer
  • Traces visible in Trace Explorer details view

Trace Search Issues

Child Spans Not Appearing in Search Results

Symptoms

  • Child spans are visible when viewing trace details
  • Cannot find child spans using trace search queries
  • Facets created from child span attributes return no results
  • Error-free configuration but no search matches

Root Cause

The Trace Explorer automatically filters out traces with trace.type="Unknown" from search results. This filter is applied implicitly (visible in browser network requests as trace.type != Unknown). Even when the “Include Child Spans” checkbox is enabled, child spans belonging to Unknown-type traces will not appear in search results or facets.

This is a product design decision to focus search results on well-categorized traces.

Diagnosis

1. Check Trace Type in Details View

Open a trace in the Trace Explorer and examine the trace type field in the details view. If the trace type shows “Unknown”, any child spans within that trace will be filtered from search.

2. Verify Filter Behavior

Open your browser’s developer tools (Network tab) and observe the query sent to the backend. You’ll see trace.type != "Unknown" added automatically to your search query.

3. Test with Known Trace Type

Create a search for child spans in traces with known types (HTTP, gRPC, etc.) to confirm search functionality works for properly typed traces.

Solutions

You have two options to make Unknown traces searchable: modify your application code or use pipeline transformations.

Option 1: Set Trace Type Attributes in Application Code

Edge Delta infers trace type by examining specific OpenTelemetry attributes. Add one or more of the recognized attributes when creating spans in your application code.

For HTTP Traces (most common):

// JavaScript/Node.js
const span = tracer.startSpan('convertCurlyTags', {
  attributes: {
    'http.flavor': '1.1',        // HTTP protocol version
    'http.method': 'POST',       // HTTP method (optional)
    'http.host': 'api.example.com',  // Target host (optional)
    // You don't need all attributes, just one or more
  }
});
# Python
with tracer.start_as_current_span('process_data') as span:
    span.set_attribute('http.flavor', '1.1')
    span.set_attribute('http.method', 'GET')
    # Perform work
// Java
Span span = tracer.spanBuilder("processRequest")
    .setAttribute("http.flavor", "1.1")
    .setAttribute("http.method", "POST")
    .startSpan();

For gRPC Traces:

attributes: {
  'rpc.system': 'grpc'
}

For Database Traces:

attributes: {
  'db.system': 'postgresql'  // or 'mysql', 'redis', 'mongodb', etc.
}

For Messaging Traces:

attributes: {
  'messaging.system': 'kafka'  // or 'rabbitmq', 'sqs', 'pubsub', etc.
}
Option 2: Use Pipeline Transformations

If you cannot modify application code, use a multiprocessor in your pipeline to set trace type attributes based on other span properties.

Example: Set HTTP attributes for specific scopes

processors:
  - name: classify-unknown-traces
    type: multiprocessor
    config:
      queries:
        - query: |
            set(attributes["http.flavor"], "1.1")
            where attributes["ed.trace.type"] == "Unknown"
            and attributes["otel.scope.name"] == "components-common-datautils"            

Example: Set trace type for specific services

processors:
  - name: classify-service-traces
    type: multiprocessor
    config:
      queries:
        - query: |
            set(attributes["http.flavor"], "1.1")
            where attributes["ed.trace.type"] == "Unknown"
            and resource["service.name"] == "my-internal-service"            

Example: Set database type for DB operations

processors:
  - name: classify-db-traces
    type: multiprocessor
    config:
      queries:
        - query: |
            set(attributes["db.system"], "postgresql")
            where attributes["ed.trace.type"] == "Unknown"
            and attributes["ed.span.resource"] matches ".*query.*"            
Recognized Trace Type Attributes

Edge Delta recognizes the following attributes for automatic trace type inference:

Trace TypeRequired Attributes (any one or more)
HTTPhttp.protocol, http.flavor, http.method, http.host, http.url, http.status_code, http.response.status_code
gRPCrpc.system
Databasedb.system
Messagingmessaging.system

Traces without any of these attributes will be classified as “Unknown” and filtered from search results.

Verification

After implementing the solution:

  1. Check Live Tail to verify the trace type is set correctly on new spans
  2. Wait for data to flow through the pipeline (may take a few minutes)
  3. Create a facet from the child span attribute you want to search (see Add a Facet)
  4. Search using the newly created facet

Expected result: Child spans now appear in search results and facets work correctly.

Facets Not Populating from Child Spans

Symptoms

  • Facet created from child span attribute
  • Facet shows no values in dropdown
  • Manual query using facet returns no results

Diagnosis

This is typically caused by the Unknown trace type issue described above. Facets only populate from searchable traces.

Solution

Follow the steps in “Child Spans Not Appearing in Search Results” to ensure your traces have recognized types.

Gateway Pipeline Trace Aggregation

When to Use Gateway Pipelines for Traces

Symptoms

  • Incomplete traces in Trace Explorer
  • Spans from the same trace appear in different agent logs
  • Cannot apply trace-level sampling decisions
  • Need service-level view across multiple clusters

Use Cases

Tail Sampling: In distributed systems where traces are collected from multiple node pipelines, spans belonging to the same parent trace can originate from different sources. Tail sampling requires seeing the entire trace before making a sampling decision. Deploy the Tail Sample Processor on a gateway pipeline to aggregate all spans and apply sampling logic to complete traces.

Cross-Cluster Aggregation: When the same service runs across multiple Kubernetes clusters or regions, gateway pipelines aggregate spans using consistent routing to ensure all spans for a given trace ID reach the same gateway instance.

Deduplication: Multiple sources may emit duplicate spans. Gateway pipelines deduplicate at the trace level.

Configuration

Step 1: Configure Node Pipeline to Send Traces to Gateway

nodes:
  # Collect traces at the node level
  - name: otlp_source
    type: otlp_input
    listen: 0.0.0.0:4318
    protocol: grpc

  # Send to gateway for aggregation
  - name: send_to_gateway
    type: ed_gateway_output
    port: 443
    protocol: grpc
    endpoint_resolution_type: k8s
    k8s_service_name: trace-gateway-svc
    target_allocation_type: consistent

Important: Use target_allocation_type: consistent to ensure all spans with the same Trace ID route to the same gateway instance. This is critical for tail sampling and trace completion.

Step 2: Configure Gateway Pipeline to Receive and Process Traces

nodes:
  # Receive traces from node pipelines
  - name: gateway_input
    type: ed_pipeline_source
    listen: 0.0.0.0:443
    protocol: grpc

  # Apply tail sampling to complete traces
  - name: tail_sampling
    type: tail_sample
    decision_interval: 30s
    sampling_policies:
      - name: sample_errors
        policy_type: status_code
        status_codes:
          - ERROR
      - name: sample_slow_traces
        policy_type: latency
        lower_threshold: 1s
      - name: sample_10_percent
        policy_type: probabilistic
        percentage: 10

  # Send sampled traces to backend
  - name: traces_destination
    type: ed_traces_output
    api_key: ${ED_API_KEY}

Step 3: Scale Gateway for High Volume

# In Helm values for gateway deployment
replicaCount: 3

resources:
  requests:
    memory: "2Gi"
    cpu: "1000m"
  limits:
    memory: "4Gi"
    cpu: "2000m"

Consistent Routing Behavior

When multiple node clusters send traces to the same gateway cluster using consistent allocation:

  • All sources sending spans for trace.id:abc123 → route to the same gateway pod
  • All sources sending spans for trace.id:xyz789 → route to the same gateway pod (likely different)
  • Ensures accurate cross-cluster trace aggregation and tail sampling

Example:

Node Cluster A: trace.id:abc123 → Gateway Pod X
Node Cluster B: trace.id:abc123 → Gateway Pod X (same pod)
Node Cluster C: trace.id:abc123 → Gateway Pod X (same pod)

Node Cluster A: trace.id:xyz789 → Gateway Pod Y
Node Cluster B: trace.id:xyz789 → Gateway Pod Y (same pod)

Verification

1. Check Gateway Receives Spans

Query gateway logs to verify span ingestion:

kubectl logs -n edgedelta deployment/trace-gateway | grep "received spans"

2. Verify Tail Sampling Decisions

kubectl logs -n edgedelta deployment/trace-gateway | grep "tail_sample"

3. Confirm Complete Traces in Explorer

Search for traces in the Trace Explorer and verify:

  • All expected spans appear in the trace details
  • Sampling decisions are applied correctly
  • Trace timing and relationships are accurate

eBPF vs OpenTelemetry Traces

Differences and Troubleshooting

eBPF Traces

Characteristics:

  • Captured at the kernel level
  • No application code changes required
  • Limited to Kubernetes environments with Edge Delta
  • Provides network-level visibility (packet paths, syscalls)
  • Requires tracerProps.enabled=true in Helm deployment

Common Issues:

Symptom: eBPF traces not appearing

Diagnosis:

# Check if tracer is enabled
kubectl get daemonset -n edgedelta -o yaml | grep -A 5 "tracerProps"

# Verify kernel privileges
kubectl get pods -n edgedelta -o jsonpath='{.items[*].spec.containers[*].securityContext}'

Solution: Ensure Helm values include:

tracerProps:
  enabled: true

securityContext:
  privileged: true

OpenTelemetry Traces

Characteristics:

  • Application-level instrumentation
  • Rich business context and custom attributes
  • Works across any environment
  • Requires OTLP Source node

Common Issues:

Symptom: OTEL traces not being received

Diagnosis:

# Check OTLP input is listening
kubectl get pods -n edgedelta -o wide
kubectl logs -n edgedelta <pod-name> | grep "otlp_input"

Solution: Verify application is configured to send to correct endpoint:

// Ensure app points to Edge Delta OTLP endpoint
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');

const exporter = new OTLPTraceExporter({
  url: 'http://edgedelta-agent:4318',
});

Performance Issues

High Trace Volume Causing Backpressure

Symptoms

  • Agent memory usage increasing
  • Dropped spans warnings in logs
  • Increased latency in application

Solutions

1. Implement Sampling at Node Level

processors:
  - name: head_sampling
    type: sample
    sampling_rate: 0.1  # Sample 10% at collection

2. Use Tail Sampling at Gateway

Move sampling decisions to gateway for smarter filtering:

# Gateway pipeline
processors:
  - name: intelligent_sampling
    type: tail_sample
    sampling_policies:
      - name: always_sample_errors
        policy_type: status_code
        status_codes: [ERROR]
      - name: sample_slow
        policy_type: latency
        lower_threshold: 500ms
      - name: sample_rest
        policy_type: probabilistic
        percentage: 5

3. Increase Gateway Resources

resources:
  requests:
    memory: "4Gi"
    cpu: "2000m"
  limits:
    memory: "8Gi"
    cpu: "4000m"

4. Adjust Buffer Settings

nodes:
  - name: gateway_output
    type: ed_gateway_output
    buffer_max_bytesize: 100MB
    buffer_ttl: 15m

Advanced Debugging

Enable Trace Debug Logging

1. Edge Delta Agent Debug Mode

agent:
  log_level: debug

2. Monitor Trace-Specific Logs

# Kubernetes
kubectl logs -n edgedelta daemonset/edgedelta | grep -i "trace\|span"

# Linux
tail -f /var/log/edgedelta/edgedelta.log | grep -i "trace\|span"

3. Filter for Specific Trace ID

kubectl logs -n edgedelta daemonset/edgedelta | grep "abc123"

Validate Trace Data Format

Check Raw Span Data:

Use Live Tail to inspect span attributes:

  1. Navigate to TelemetryLive Tail
  2. Filter for data type: Trace
  3. Examine span attributes and resource attributes
  4. Verify expected fields are present

Test Trace Flow End-to-End

1. Generate Test Traces

# Use telemetry generator for testing
nodes:
  - name: test_traces
    type: telemetry_gen
    telemetry_types:
      - trace
    rate: 10

2. Verify at Each Stage

  • Check node pipeline logs for ingestion
  • Check gateway pipeline logs for aggregation
  • Check Trace Explorer for final visibility

Best Practices

Trace Collection

  1. Always set recognized trace type attributes in application code
  2. Use consistent trace ID formats across services
  3. Include service.name in resource attributes
  4. Set appropriate span kinds (INTERNAL, SERVER, CLIENT, etc.)

Gateway Aggregation

  1. Use consistent routing for trace aggregation
  2. Size gateway based on total trace volume, not source count
  3. Apply tail sampling only at gateway, not at nodes
  4. Monitor gateway memory for trace buffering

Search and Analysis

  1. Create facets from frequently searched attributes
  2. Avoid creating facets from high-cardinality fields
  3. Use saved queries for common trace searches
  4. Leverage correlation with logs using pod ID and timestamp

Performance

  1. Implement head sampling for extremely high volumes
  2. Use tail sampling for intelligent filtering
  3. Monitor buffer sizes and adjust as needed
  4. Scale gateway horizontally for increased throughput

Common Error Messages

Error MessageLikely CauseSolution
“Trace type Unknown”Missing recognized attributesAdd http.flavor, rpc.system, db.system, or messaging.system
“No child spans found”Trace type is UnknownSet trace type attributes per solutions above
“Spans from different sources”No gateway aggregationConfigure gateway pipeline with consistent routing
“Tail sampling incomplete trace”Spans arriving at different gatewaysVerify target_allocation_type: consistent
“eBPF traces not visible”Tracer not enabled or missing privilegesSet tracerProps.enabled=true and privileged: true
“OTLP connection refused”Wrong endpoint or portVerify OTLP input configuration and service exposure

Getting Help

If issues persist after following this guide:

  1. Collect Diagnostic Information:

    • Edge Delta configuration (sanitized)
    • Sample trace ID that demonstrates the issue
    • Agent and gateway logs
    • Network topology (node → gateway → backend)
  2. Check Agent Logs:

    kubectl logs -n edgedelta daemonset/edgedelta --tail=1000 > agent-logs.txt
    kubectl logs -n edgedelta deployment/gateway --tail=1000 > gateway-logs.txt
    
  3. Gather Trace Details:

    • Screenshot from Trace Explorer showing the issue
    • Sample span attributes from Live Tail
    • Search query that’s not working as expected
  4. Contact Support with:

    • Diagnostic information from above
    • Expected vs actual behavior
    • Steps to reproduce
    • Workarounds attempted