Level 3 Metrics Maturity with Edge Delta
5 minute read
Overview
At this level, your observability extends into the application layer. Metrics can originate from three main sources:
- Direct instrumentation such as using OpenTelemetry (OTEL)
- Metrics generated from logs or other data items
- Traffic-level metrics collected from communication events between services
This level is inherently more customized to the applications and services you operate, but several common use cases apply broadly.
Track Application Latency and Error Rates
Are applications responding within acceptable latency thresholds, and are error rates under control?
You can instrument your services to emit metrics such as request duration (http.server.duration
), error counts (http.server.errors
), and status codes by path or route. These metrics allow you to track how long requests take to complete, whether any failures occur, and which endpoints or services are affected.
Edge Delta can also generate latency and error rate metrics from logs such as HTTP status codes, time-to-first-byte, or backend error messages. These log-based metrics are particularly valuable when direct instrumentation is not possible.
By continuously monitoring latency and errors at the application level, you can identify regressions tied to new deployments, infrastructure changes, or traffic spikes — even before users report issues.
Monitor Application Traffic Volume and Network Performance
Is traffic volume normal, and are inter-service communications responsive?
The k8s.traffic.communication.count
metric tracks the number of communication events within the Kubernetes environment. Monitoring this metric over time helps you detect spikes or drops in traffic that could indicate scaling events, outages, or changes in client behavior.
For latency analysis, the k8s.traffic.communication.latency.avg
and k8s.traffic.communication.latency.p95
metrics provide insight into the responsiveness of service-to-service interactions. An increase in average latency may reflect network congestion or overutilized services, while a rise in the 95th percentile latency can highlight tail-end delays that affect a subset of users or workloads.
To understand bandwidth and data flow patterns, the k8s.traffic.communication.read_bytes.sum
and k8s.traffic.communication.write_bytes.sum
metrics track the volume of data received and sent across the cluster. Unusual changes in read or write traffic may indicate misconfigured services, data leaks, or abnormal workloads.
By combining these traffic metrics with application-layer telemetry and log-based indicators, you can build a comprehensive view of how traffic patterns impact application behavior and user experience — enabling early detection of regressions, outages, or architectural inefficiencies.
Detect Anomalies in Application Behavior
Are there abnormal usage patterns or unexpected spikes in key application metrics?
Log-based metrics allow you to convert key log fields (e.g., user IDs, endpoints, service names) into time-series data. You can apply thresholds to these derived metrics to flag unexpected patterns such as traffic surges, increased latency for specific users, or error rate increases for specific routes.
For OTEL-instrumented applications, span-level metrics such as span.duration
, span.status_code
, and span.count
can be aggregated to detect slowdowns, retries, or downstream service failures. These metrics allow for fine-grained anomaly detection based on service dependencies or transactional boundaries.
By detecting and responding to anomalous patterns early, you can reduce the time to resolve performance issues and prevent small regressions from cascading into outages.
Correlate Infrastructure and Application Metrics
Can application issues be tied back to node, pod, or container-level events?
At this maturity level, you can correlate application performance metrics with infrastructure signals such as CPU, memory, disk I/O, and container restarts. For example, a spike in http.server.duration
or http.server.errors
can be analyzed alongside k8s.container.memory.usage_bytes.value
or k8s.container.cpu.usage_seconds.rate
to determine if the issue is rooted in resource exhaustion.
This correlation is especially powerful when logs and metrics are aligned by labels such as pod_name
, namespace
, or container_name
. Edge Delta’s unified observability pipeline ensures that application metrics, logs, and infrastructure signals can be analyzed together — supporting faster root cause analysis and improving cross-team collaboration.
Monitor Business and Domain-Specific KPIs
Are domain-specific metrics (e.g., transactions processed, signups, API usage) healthy and aligned with expectations?
You can define domain-specific metrics — such as transactions.count
, signup.latency
, or payment.failure_rate
— using OTEL instrumentation or through log parsing and pattern extraction. These business KPIs are often the most critical indicators of service health and user experience.
Edge Delta’s support for metric generation from logs allows you to extract these KPIs even from legacy applications or systems that are not OTEL-instrumented. By observing these business-centric signals in real-time, you can identify problems faster, reduce the impact on end users, and improve service-level objectives (SLOs).
Analyze Distributed Traces to Understand Request Flow and Latency Sources
In addition to metrics and logs, distributed tracing provides the next layer of observability for pinpointing latency and understanding execution flow.
Where in the system are delays occurring, and how do requests flow across services and infrastructure?
Edge Delta supports both OpenTelemetry (OTEL) traces and out-of-the-box eBPF-based tracing, enabling a complete view of application and system behavior.
OTEL traces provide high-level, semantically rich observability into application workflows. Spans capture events like HTTP requests, database queries, queue operations, and custom business logic — including metadata such as service name, route, status code, and duration. By analyzing trace trees, you can pinpoint slow operations, identify dependency bottlenecks, and detect performance regressions across distributed services.
In parallel, eBPF-based tracing enables deep visibility into system-level behavior. Without modifying application code, Edge Delta captures kernel-level events like network packet traversal, file system access, process lifecycle events, and system call activity. These traces help uncover low-level sources of latency, such as DNS resolution delays, socket issues, or I/O contention — which may not be visible in application-layer traces.
By combining OTEL and eBPF traces, you can correlate high-level request flow with underlying infrastructure behavior. For example, if an HTTP request span shows a latency spike, eBPF data can reveal whether it was caused by a blocked syscall, disk I/O delay, or network retransmission.
This multi-layered tracing approach enables faster root cause analysis, supports both modern and legacy workloads, and provides comprehensive visibility — from user-facing APIs down to kernel-level events.