Database Performance Investigation
3 minute read
When cascading errors affect multiple service tiers, teammates correlate telemetry across the stack to identify whether the root cause lies in the frontend, backend services, or database layer.
Environment Setup
| Component | Purpose |
|---|---|
| Edge Delta MCP Connector | Query logs from NGINX, backend services, and database |
| GitHub Connector | Review recent commits for query or connection handling changes |
| Kubernetes Source | Ingest container logs from NGINX, backend, and database pods |
| Context Filter Processor | Preserve contextual logs around error events |
| Pattern Anomaly Monitor | Detect error spikes and trigger investigations |
| Monitor Notifications | Route alerts to an AI Team channel |
| AI Team Channel | Receive monitor notifications and route to OnCall AI |
For Kubernetes deployments, configure a Kubernetes Source to capture container logs from NGINX, backend, and database pods. For VM-based deployments, use a File Connector to ingest log files from each tier. Add a Context Filter processor to the pipeline to preserve INFO, WARN, and DEBUG logs surrounding error events, giving SRE the context needed to trace cascading failures. A Pattern Anomaly Monitor watches for error spikes across services and routes notifications to an AI Team channel. The Edge Delta MCP and GitHub connectors provision an AI Team ingestion pipeline, enabling teammates to query telemetry and correlate with code changes.
Data Flow
flowchart LR
A[NGINX] --> B[Telemetry Pipeline]
C[Backend Services] --> B
D[PostgreSQL] --> B
B --> E[Edge Delta Backend]
E --> F[Pattern Anomaly Monitor]
F -->|Error Spike| G[OnCall AI]
G --> H[SRE Teammate]
G --> I[Code Analyzer]
H -->|Queries| J[Edge Delta MCP]
J --> E
I -->|Queries| K[GitHub]Logs from all tiers flow through the Telemetry Pipeline to the Edge Delta backend. When the Pattern Anomaly Monitor detects error spikes, it notifies OnCall AI through the configured channel. SRE queries across service boundaries using the Edge Delta MCP connector to trace errors from the frontend down to the database.
Investigation Workflow
- OnCall AI receives alerts indicating elevated error rates in the frontend-proxy service and initiates an investigation thread
- SRE queries logs from NGINX, identifying 502 and 504 responses correlated with specific upstream backends
- SRE examines backend service logs, finding connection pool exhaustion and timeout errors when communicating with PostgreSQL
- SRE analyzes PostgreSQL logs, identifying a long-running query consuming excessive connections and blocking other operations
- Code Analyzer reviews recent commits for changes to database queries or connection handling
- OnCall AI synthesizes findings into a root cause summary with specific remediation steps, such as query optimization, connection pool tuning, or adding query timeouts
This multi-tier correlation pattern surfaces the root cause with supporting evidence from each layer. SRE autonomously queries logs across service boundaries, correlates error patterns, and identifies the specific database issue causing cascading failures.