Cross-Platform Telemetry Investigation

Query logs, metrics, and traces across Elasticsearch and Edge Delta to correlate findings from multiple observability platforms.

When telemetry spans multiple platforms (logs stored in Elasticsearch, metrics and traces in Edge Delta), teammates coordinate data retrieval across sources to build a complete picture. This architecture optimizes costs by splitting datasets across systems while maintaining unified investigation capabilities.

Environment Setup

ComponentPurpose
Edge Delta MCP ConnectorQuery metrics and traces from Edge Delta backend
Elastic MCP ConnectorExecute ES|QL queries against Elasticsearch logs
AI Team ChannelReceive investigation requests and route to OnCall AI

This workflow typically starts from user requests submitted to an AI Team channel (such as an #incident-response channel). The Edge Delta MCP and Elastic MCP connectors provision an AI Team ingestion pipeline, enabling teammates to query both platforms during investigations. The Elastic connector requires an API key with read access to the relevant indices.

Data Flow

flowchart LR
    H[User Request] --> G[AI Team Channel]
    G --> F[OnCall AI]
    F --> E[SRE Teammate]
    E -->|Queries| I[Edge Delta MCP]
    E -->|Queries| J[Elastic MCP]
    I --> D[Edge Delta Backend]
    J --> C[Elasticsearch]
    D --- B[Edge Delta Pipeline]
    C --- A[Applications]
    B --- A

This pattern supports hybrid observability architectures where different telemetry types live in different platforms. SRE queries metrics and traces from Edge Delta while pulling logs from Elasticsearch via ES|QL. OnCall AI coordinates the investigation and synthesizes findings from both sources.

Investigation Workflow

  1. OnCall AI receives an investigation request (such as error spikes or latency issues) and initiates an investigation thread
  2. SRE queries Edge Delta for metrics and traces, identifying anomalies such as 504 responses in the frontend-proxy service
  3. SRE uses the Elastic connector to construct ES|QL queries, retrieving relevant logs from Elasticsearch for the affected timeframe
  4. SRE correlates patterns across both data sources: specific endpoints returning errors, connections to problematic upstream instances, timeout indicators in logs
  5. OnCall AI synthesizes findings into remediation steps: pod health checks, service configuration validation, upstream dependency verification

This pattern eliminates manual tool switching during investigations. SRE autonomously pulls telemetry from each platform, correlates findings across sources, and generates actionable remediation plans.

Learn More