Security-Related Service Degradation

Coordinate across observability and security domains when service degradation has potential security implications.

When service degradation has potential security implications, teammates coordinate across observability and security domains to identify whether the issue stems from an attack, misconfiguration, or application bug.

Environment Setup

ComponentPurpose
Edge Delta MCP ConnectorQuery logs, metrics, and security data from Edge Delta backend
AWS ConnectorValidate IAM configurations and review CloudTrail access patterns
GitHub ConnectorExamine recent deployments for correlated changes
Elastic MCP ConnectorQuery security logs in Elasticsearch (optional)
Atlassian ConnectorAccess security runbooks in Confluence (optional)
PagerDuty ConnectorReceive incident alerts via webhook (optional)
Sentry ConnectorReceive application error events via webhook (optional)
Monitor NotificationsRoute Edge Delta monitor alerts to AI Team channel (optional)
AI Team ChannelReceive notifications and route to OnCall AI

This workflow can trigger from multiple alert sources. Configure one or more of the following: a PagerDuty connector with webhooks enabled to receive incidents, a Sentry connector with webhooks for application errors, or an Edge Delta monitor with notifications routed to an AI Team channel. The Edge Delta MCP, AWS, and GitHub connectors provision an AI Team ingestion pipeline, enabling teammates to query telemetry and infrastructure during investigations. Add the Elastic MCP connector if security logs are stored in Elasticsearch, and the Atlassian connector if security runbooks are in Confluence.

Data Flow

flowchart LR
    A[PagerDuty Incident] --> D[AI Team Channel]
    B[Sentry Error Event] --> D
    C[Edge Delta Monitor] --> D
    D --> E[OnCall AI]
    E --> F[SRE Teammate]
    E --> G[Security Engineer]
    E --> H[Code Analyzer]
    F -->|Queries| I[Edge Delta MCP]
    F -->|Runbooks| M[Atlassian]
    G -->|Queries| J[AWS Connector]
    G -->|Queries| L[Elastic MCP]
    H -->|Queries| K[GitHub]

Alerts can originate from any event connector with webhooks enabled (such as PagerDuty incidents or Sentry error events) or from Edge Delta monitors. Each source sends notifications to an AI Team channel via webhooks or monitor notifications. OnCall AI evaluates the context and engages the appropriate specialists. Security Engineer focuses on access patterns and IAM validation, SRE analyzes performance telemetry, and Code Analyzer reviews recent deployments for correlated changes.

Investigation Workflow

  1. OnCall AI receives the notification and initiates an investigation thread
  2. SRE analyzes recent logs and metrics for anomalies: error patterns, latency spikes, or failed requests. SRE retrieves relevant security runbooks from Confluence if available.
  3. Security Engineer checks for suspicious activity using AWS CloudTrail, validates IAM configurations, and queries security logs via Elastic MCP if available
  4. Code Analyzer examines recent deployments for changes that correlate with the degradation
  5. OnCall AI synthesizes findings into a summary with root-cause hypotheses and recommended next steps

Each teammate queries real-time telemetry data through their assigned connectors, ensuring analysis reflects current system state rather than stale snapshots. OnCall AI coordinates the handoffs and compiles the final assessment.

Learn More