Interpret Traces in Edge Delta
3 minute read
Overview
Tracing is vital in distributed systems as it enables in-depth analysis and troubleshooting by providing detailed views of requests across various services. It aids in debugging network or service-related issues, identifies performance bottlenecks, and offers critical insights into errors, status codes, and request durations. Tracing also facilitates performance monitoring and resource management by tracking essential service-level metrics like latency and error rates. Additionally, it provides a holistic view of system interactions, crucial for understanding dependencies and maintaining reliable, high-performing applications.
The Trace Explorer provides a detailed and organized view of traces.
Note: Requires agent version 1.24.0 or higher.
Traces
Edge Delta supports OpenTelemetry (OTEL) traces and out-of-the-box eBPF traces:
- OTEL traces capture high-level application data, such as the logical flow of requests through services, database queries, HTTP requests, and custom business logic spans.
- Linux’s Extended Berkeley Packet Filter (eBPF) technology collects telemetry data directly from the operating system, without requiring changes to the application code. Operating at the kernel level, eBPF traces capture system-level information such as network packet paths, file system access, process execution, and kernel function calls. This low-level system data helps in understanding the underlying infrastructure and performance characteristics of applications.
Prerequisites
To ingest OTEL traces, you must deploy a pipeline with the OTLP Source node and configure it to listen on a port that is receiving traces. For eBPF traces, you must deploy a pipeline with the Kubernetes Trace Source node.
To view traces in Edge Delta’s Trace Explorer, trace pipelines must be connected to the Edge Delta Traces destination node. You can also route traces to third party destinations that support the trace data type.
Trace Explorer
Click Traces to open the Traces Explorer.
You can filter by multiple default dimensions including the Service.
Trace Details
Click a trace in the table to view its details. This can be useful for debugging network or service-related issues within a distributed system.
This trace provides a view of a failure where a connection termination leads to a service availability error.
You can select each span to see its details:
Diagnose Traces
Use the Trace Explorer to uncover issues. In this example, the router frontend egress span encountered an error with a reason connection termination
and the HTTP status code as 0
, indicating that the outbound request failed due to connection issues.
The ingress span also indicates an error with an HTTP status code 503
(Service Unavailable). This span has response flags UC
, suggesting that there was an upstream connection termination.
The total duration of the request in the loadgenerator
component is significantly longer than the durations of the ingress
and router
frontend egress spans in the frontendproxy
component. This suggests that the request was either queued or retried due to the connection issues.
Next steps could be to:
- Use the Metrics Explorer:
- Look into network performance metrics such as latency, packet loss, and throughput around the time of the issue.
- Check service-level metrics for the frontend proxy, including request rates, error rates, and response times.
- Monitor CPU, memory, and other resource utilization metrics for the frontend proxy and back end services.
- Check Events:
- Review any deployment or configuration change events around the time of the incident.
- Check for scaling events that might have occurred around the time of the failure.
- Check Logs:
- Investigate frontend proxy logs to look for error messages or warnings around the time of the connection termination.
- Investigate logs from the back end services that were involved in the trace.
- Check any network-related logs (firewalls, load balancers, etc.) to determine if there were any network anomalies or disruptions.