Cross-Platform Telemetry Investigation
2 minute read
When telemetry spans multiple platforms (logs stored in Elasticsearch, metrics and traces in Edge Delta), teammates coordinate data retrieval across sources to build a complete picture. This architecture optimizes costs by splitting datasets across systems while maintaining unified investigation capabilities.
Environment Setup
| Component | Purpose |
|---|---|
| Edge Delta MCP Connector | Query metrics and traces from Edge Delta backend |
| Elastic MCP Connector | Execute ES|QL queries against Elasticsearch logs |
| AI Team Channel | Receive investigation requests and route to OnCall AI |
This workflow typically starts from user requests submitted to an AI Team channel (such as an #incident-response channel). The Edge Delta MCP and Elastic MCP connectors provision an AI Team ingestion pipeline, enabling teammates to query both platforms during investigations. The Elastic connector requires an API key with read access to the relevant indices.
Data Flow
flowchart LR
H[User Request] --> G[AI Team Channel]
G --> F[OnCall AI]
F --> E[SRE Teammate]
E -->|Queries| I[Edge Delta MCP]
E -->|Queries| J[Elastic MCP]
I --> D[Edge Delta Backend]
J --> C[Elasticsearch]
D --- B[Edge Delta Pipeline]
C --- A[Applications]
B --- AThis pattern supports hybrid observability architectures where different telemetry types live in different platforms. SRE queries metrics and traces from Edge Delta while pulling logs from Elasticsearch via ES|QL. OnCall AI coordinates the investigation and synthesizes findings from both sources.
Investigation Workflow
- OnCall AI receives an investigation request (such as error spikes or latency issues) and initiates an investigation thread
- SRE queries Edge Delta for metrics and traces, identifying anomalies such as 504 responses in the frontend-proxy service
- SRE uses the Elastic connector to construct ES|QL queries, retrieving relevant logs from Elasticsearch for the affected timeframe
- SRE correlates patterns across both data sources: specific endpoints returning errors, connections to problematic upstream instances, timeout indicators in logs
- OnCall AI synthesizes findings into remediation steps: pod health checks, service configuration validation, upstream dependency verification
This pattern eliminates manual tool switching during investigations. SRE autonomously pulls telemetry from each platform, correlates findings across sources, and generates actionable remediation plans.