Security-Related Service Degradation
3 minute read
When service degradation has potential security implications, teammates coordinate across observability and security domains to identify whether the issue stems from an attack, misconfiguration, or application bug.
Environment Setup
| Component | Purpose |
|---|---|
| Edge Delta MCP Connector | Query logs, metrics, and security data from Edge Delta backend |
| AWS Connector | Validate IAM configurations and review CloudTrail access patterns |
| GitHub Connector | Examine recent deployments for correlated changes |
| Elastic MCP Connector | Query security logs in Elasticsearch (optional) |
| Atlassian Connector | Access security runbooks in Confluence (optional) |
| PagerDuty Connector | Receive incident alerts via webhook (optional) |
| Sentry Connector | Receive application error events via webhook (optional) |
| Monitor Notifications | Route Edge Delta monitor alerts to AI Team channel (optional) |
| AI Team Channel | Receive notifications and route to OnCall AI |
This workflow can trigger from multiple alert sources. Configure one or more of the following: a PagerDuty connector with webhooks enabled to receive incidents, a Sentry connector with webhooks for application errors, or an Edge Delta monitor with notifications routed to an AI Team channel. The Edge Delta MCP, AWS, and GitHub connectors provision an AI Team ingestion pipeline, enabling teammates to query telemetry and infrastructure during investigations. Add the Elastic MCP connector if security logs are stored in Elasticsearch, and the Atlassian connector if security runbooks are in Confluence.
Data Flow
flowchart LR
A[PagerDuty Incident] --> D[AI Team Channel]
B[Sentry Error Event] --> D
C[Edge Delta Monitor] --> D
D --> E[OnCall AI]
E --> F[SRE Teammate]
E --> G[Security Engineer]
E --> H[Code Analyzer]
F -->|Queries| I[Edge Delta MCP]
F -->|Runbooks| M[Atlassian]
G -->|Queries| J[AWS Connector]
G -->|Queries| L[Elastic MCP]
H -->|Queries| K[GitHub]Alerts can originate from any event connector with webhooks enabled (such as PagerDuty incidents or Sentry error events) or from Edge Delta monitors. Each source sends notifications to an AI Team channel via webhooks or monitor notifications. OnCall AI evaluates the context and engages the appropriate specialists. Security Engineer focuses on access patterns and IAM validation, SRE analyzes performance telemetry, and Code Analyzer reviews recent deployments for correlated changes.
Investigation Workflow
- OnCall AI receives the notification and initiates an investigation thread
- SRE analyzes recent logs and metrics for anomalies: error patterns, latency spikes, or failed requests. SRE retrieves relevant security runbooks from Confluence if available.
- Security Engineer checks for suspicious activity using AWS CloudTrail, validates IAM configurations, and queries security logs via Elastic MCP if available
- Code Analyzer examines recent deployments for changes that correlate with the degradation
- OnCall AI synthesizes findings into a summary with root-cause hypotheses and recommended next steps
Each teammate queries real-time telemetry data through their assigned connectors, ensuring analysis reflects current system state rather than stale snapshots. OnCall AI coordinates the handoffs and compiles the final assessment.