PagerDuty Incident Response Automation

Reduce alert fatigue by automating PagerDuty incident triage and response with AI teammates and confidence-based action routing.

3 minute read

You can reduce alert fatigue by using AI teammates to triage PagerDuty incidents, investigate likely causes, and coordinate response actions while keeping humans in control for higher-risk changes.

Data flow

flowchart LR
    A[PagerDuty Incident Triggered] -->|Webhook| B[AI Team Channel]
    B --> C[OnCall AI]
    C --> D[SRE Teammate]
    C --> E[Code Analyzer]
    D -->|Queries| F[Edge Delta MCP]
    E -->|Queries| G[GitHub]
    D -->|Incident Updates| H[PagerDuty Connector]
    C -->|Approval Requests| I[Human On-Call]
    H --> J[PagerDuty Incident Timeline]

PagerDuty sends incident events through webhook to the configured channel. OnCall AI creates an investigation thread and delegates analysis tasks to SRE and Code Analyzer. SRE gathers telemetry evidence from Edge Delta through MCP tools, while Code Analyzer checks for recent code or deployment changes. OnCall AI then coordinates responder actions and writes findings back to PagerDuty.

Environment setup

Component	Purpose
PagerDuty Connector	Receive incident events via webhook and manage incident status, urgency, assignees, and notes
PagerDuty Integration Guide	Configure Generic Webhooks (v3), authorization headers, and event subscriptions
Edge Delta MCP Connector	Query logs, metrics, traces, and service context for incident investigation
GitHub Connector	Correlate incidents with recent deployments, pull requests, and configuration changes (optional)
AI Team Channel	Receive PagerDuty webhook events and route to OnCall AI for orchestration

Configure the PagerDuty connector and enable webhook delivery so incident lifecycle events are posted into an AI Team channel such as #alerts. Add the Edge Delta MCP connector for telemetry investigation, and optionally add GitHub for change-correlation checks. For production guardrails, keep read operations set to Allow and configure write operations with Ask Permission where human approval is required.

Investigation workflow

The following is an example of how the teammates might handle an incoming PagerDuty incident. The exact behavior depends on your connector configuration, teammate instructions, and incident context.

OnCall AI receives the incident event and opens an investigation thread in the target channel
SRE queries logs, metrics, and traces to determine service health, blast radius, and likely root cause
Code Analyzer checks recent deployments and pull requests to identify potential change-related regressions
OnCall AI synthesizes the findings, classifies urgency, and proposes next actions
OnCall AI routes actions based on confidence level: it applies autonomous updates for low-risk tasks, or requests human approval for higher-impact remediation
SRE and OnCall AI update the PagerDuty incident with timeline notes, assigned responders, and remediation status
OnCall AI continues monitoring until service recovery is confirmed, then recommends closure and post-incident follow-up

Automation confidence levels

The teammates assess confidence levels to determine which actions to take autonomously and which to escalate for human approval. The examples below illustrate typical behavior, but teammates may adapt based on their instructions and the available evidence.

High confidence (autonomous)

When evidence is strong and the change is low-risk and reversible, the teammates act autonomously:

Update incident priority and urgency based on telemetry-backed impact
Assign responders using service ownership and on-call schedule context
Add structured incident notes with findings and runbook links
Suppress or de-prioritize clearly noisy, non-actionable alerts

Medium confidence (approval-gated)

When a proposed action can affect service behavior or rollout state, the teammates request human approval before proceeding:

Escalate to the on-call engineer for manual restart actions
Trigger a rollback through your CI/CD workflow
Run cloud remediation playbooks

When evidence is incomplete or conflicting, the teammates gather context and surface recommendations without taking write actions:

Collect additional signals and identify gaps in the investigation
Ask follow-up questions to refine incident scope and impact
Recommend next steps for human responders to evaluate

PagerDuty Incident Response Automation

Data flow

Environment setup

Investigation workflow

Automation confidence levels

High confidence (autonomous)

Medium confidence (approval-gated)

Low confidence (investigate and recommend)

Learn more

Edge Delta AI Assistant

Conversations

Hi! I'm your Edge Delta AI Assistant

Current Context