Edge Delta Trace Explorer

Traces in the Edge Delta web application.

Overview

Tracing is vital in distributed systems as it enables in-depth analysis and troubleshooting by providing detailed views of requests across various services. It aids in debugging network or service-related issues, identifies performance bottlenecks, and offers critical insights into errors, status codes, and request durations. Tracing also facilitates performance monitoring and resource management by tracking essential service-level metrics like latency and error rates. Additionally, it provides a holistic view of system interactions, crucial for understanding dependencies and maintaining reliable, high-performing applications.

The Trace Explorer provides a detailed and organized view of traces.

Traces

Edge Delta supports OpenTelemetry (OTEL) traces and out-of-the-box eBPF traces:

  • OTEL traces capture high-level application data, such as the logical flow of requests through services, database queries, HTTP requests, and custom business logic spans.
  • Linux’s Extended Berkeley Packet Filter (eBPF) technology collects telemetry data directly from the operating system, without requiring changes to the application code. Operating at the kernel level, eBPF traces capture system-level information such as network packet paths, file system access, process execution, and kernel function calls. This low-level system data helps in understanding the underlying infrastructure and performance characteristics of applications.

Prerequisites

To ingest OTEL traces, you must deploy a pipeline with the OTLP Source node and configure it to listen on a port that is receiving traces. For eBPF traces, you must deploy a pipeline with the Kubernetes Trace Source node.

To view traces in Edge Delta’s Trace Explorer, trace pipelines must be connected to the Edge Delta Traces destination node. You can also route traces to third party destinations that support the trace data type.

Trace Explorer

Click Traces to open the Traces Explorer.

You can filter by multiple default dimensions including the Service.

Filter by Duration

You can also filter traces and their child spand by the duration to focus on, for example, traces with an abnormally high latency. This is useful when investigating 5xx status codes for example.

Save Query

When you have configured a detailed set of filters for a particular use case you can save it for later use. Click Save Query to add it to the Saved tab.

Trace Details

Click a trace in the table to view its details. This can be useful for debugging network or service-related issues within a distributed system.

This trace provides a view of a failure where a connection termination leads to a service availability error.

You can select each span to see its details:

The router frontend egress span encountered an error with a reason connection termination and the HTTP status code as 0, indicating that the outbound request failed due to connection issues.

The ingress span also indicates an error with an HTTP status code 503 (Service Unavailable). This span has response flags UC, suggesting that there was an upstream connection termination.

The total duration of the request in the loadgenerator component is significantly longer than the durations of the ingress and router frontend egress spans in the frontendproxy component. This suggests that the request was either queued or retried due to the connection issues.

Next steps could be to:

  1. Use the Metrics Explorer:
  • Look into network performance metrics such as latency, packet loss, and throughput around the time of the issue.
  • Check service-level metrics for the frontend proxy, including request rates, error rates, and response times.
  • Monitor CPU, memory, and other resource utilization metrics for the frontend proxy and back end services.
  1. Check Events:
  • Review any deployment or configuration change events around the time of the incident.
  • Check for scaling events that might have occurred around the time of the failure.
  1. Check Logs:
  • Investigate frontend proxy logs to look for error messages or warnings around the time of the connection termination.
  • Investigate logs from the back end services that were involved in the trace.
  • Check any network-related logs (firewalls, load balancers, etc.) to determine if there were any network anomalies or disruptions.

View Logs

You can view logs associated with a particular span. Click Logs:

Click a log to view it, and click View in context to view it in the Log Explorer in a new window.

You can select an attribute in the Details pane to add it to the search string, or exclude it from results. In addition you can copy the attribute to the clipboard to create a custom facet from it.