Autonomous Remediation

Learn how AI teammates detect issues, analyze code in an isolated sandbox, and propose fixes that close the loop from alert to pull request.

5 minute read

Overview

Most observability platforms treat detection and remediation as separate concerns. Monitors fire, dashboards light up, and then a human begins the slow work of correlating signals, reading code, and assembling a fix. The gap between “alert fires” and “fix deployed” remains a manual process, and it scales poorly as infrastructure complexity grows.

This separation is not inevitable. If an agent can query telemetry, it can also read the code that produced the telemetry. If it can identify a suspicious change, it can clone the repository, analyze the diff, and draft a fix. The missing ingredient is an execution environment where the agent can do this work without stuffing entire codebases into its context window.

Edge Delta bridges this gap with a team of specialized AI Teammates. The SRE and Security Teammates work alongside OnCall AI to continuously monitor and investigate — correlating telemetry, identifying root cause, and surfacing issues. The Software Engineer Teammate then takes over: it clones the relevant repository into a sandbox, an isolated virtual machine where it can read the full codebase, write and test code, and open a pull request. Human judgment stays in the loop for approvals and merges, but the mechanical work of evidence gathering, root cause identification, and fix proposal happens autonomously.

The remediation cycle

The value of autonomous remediation comes from connecting stages that traditionally involve different people, tools, and context switches. The first three stages are handled by monitoring and investigative teammates. Stages four and five are where the Software Engineer Teammate picks up the work. Each stage builds on context from the previous one, and the full chain can run without human prompting.

Detect. A pattern anomaly monitor, event connector, or manual report surfaces an issue.
Investigate. OnCall AI delegates to the right specialized teammates, who query logs, metrics, and traces to build a timeline.
Identify root cause. Teammates correlate the anomaly with recent changes (pull requests, deployments, configuration updates) to narrow the cause.
Analyze code. The Software Engineer Teammate clones the relevant repository into the sandbox, reads the codebase, and pinpoints the offending change.
Propose a fix. The Software Engineer Teammate writes a code change in the sandbox, runs available tests locally, and opens a pull request through the GitHub connector.
Verify. After the fix is deployed, teammates monitor production telemetry to confirm the issue is resolved.

flowchart LR
    A["Detect"] --> B["Investigate"]
    B --> C["Identify
root cause"]
    C --> D["Analyze code
(sandbox)"]
    D --> E["Propose fix
(PR)"]
    E --> F["Verify"]
    F -.->|"new issue"| A

Why the sandbox matters

The sandbox is an isolated virtual machine provisioned on demand for AI teammates. It provides file system access, bash commands, and the ability to write, compile, and run code. Three properties make it the foundation of autonomous remediation.

Deep code understanding. The Software Engineer Teammate clones entire repositories and navigates the full codebase rather than relying on code snippets passed through the model context window. This means they can trace call paths, understand dependencies, and evaluate changes in their full context.
Token efficiency. Large codebases and API responses are downloaded to the sandbox file system instead of being loaded into the Software Engineer Teammate’s context window. This substantially reduces token consumption for investigations that involve large repositories or verbose API responses.
Self-healing execution. Teammates can retry unreliable API calls, write Python or Bash scripts to analyze large datasets locally, and handle errors programmatically. If an API response is too large to fit in context, the teammate downloads it to disk and processes it with a script. If an endpoint is intermittent, the teammate writes retry logic. This makes investigations more reliable and less dependent on perfect external conditions.

How memories accelerate investigations

During investigations, teammates draw on two types of stored knowledge.

Organization-wide memories come from previous analysis findings. When a teammate encounters a pattern it has investigated before, it retrieves those findings to accelerate root cause identification. These memories appear alongside the analysis in the investigation thread.
Personal memories reflect individual user preferences, such as preferred repositories, communication style, or prior decisions.

Both memory types are configurable. You can toggle organization-wide and personal memories independently and set retention policies in Settings.

Comparison with traditional approaches

Capability	Traditional workflow	AI Team with sandbox
Investigation start	Manual triage after alert	Automatic, triggered by monitor or event
Code analysis	Engineer reads diffs in a browser	Teammate clones repo, reads full codebase
Fix proposal	Engineer writes code locally	Teammate writes and tests fix in sandbox
PR creation	Manual	Automated, pending human approval
Context overhead	Full codebase in context window	Codebase on local file system
Error recovery during analysis	Manual retry	Automated retry with local fallback

When autonomous remediation applies

The full cycle is most valuable for issues with a clear code-level cause.

Regression bugs traceable to a recent pull request
Configuration errors in infrastructure-as-code
CI/CD pipeline failures with identifiable root causes
Security vulnerabilities that require code-level fixes

Human approval remains in the loop for PR merges and production deployments. Teammates propose changes but do not merge or deploy without explicit consent.

Sandbox for practical details on the execution environment
AI Team Fundamentals for the broader AI Team architecture
Model Context Protocol in Edge Delta for how MCP connects context and automations
GitHub Connector for PR operations available to teammates
Software Engineer Teammate for the specialized teammate that handles code analysis and fix proposals
Security Best Practices for permission models and approval workflows

Autonomous Remediation

Overview

The remediation cycle

Why the sandbox matters

How memories accelerate investigations

Comparison with traditional approaches

When autonomous remediation applies

Related resources

Edge Delta AI Assistant

Conversations

Hi! I'm your Edge Delta AI Assistant

Current Context