Autonomous Remediation
5 minute read
Overview
Most observability platforms treat detection and remediation as separate concerns. Monitors fire, dashboards light up, and then a human begins the slow work of correlating signals, reading code, and assembling a fix. The gap between “alert fires” and “fix deployed” remains a manual process, and it scales poorly as infrastructure complexity grows.
This separation is not inevitable. If an agent can query telemetry, it can also read the code that produced the telemetry. If it can identify a suspicious change, it can clone the repository, analyze the diff, and draft a fix. The missing ingredient is an execution environment where the agent can do this work without stuffing entire codebases into its context window.
Edge Delta bridges this gap with a team of specialized AI Teammates. The SRE and Security Teammates work alongside OnCall AI to continuously monitor and investigate — correlating telemetry, identifying root cause, and surfacing issues. The Software Engineer Teammate then takes over: it clones the relevant repository into a sandbox, an isolated virtual machine where it can read the full codebase, write and test code, and open a pull request. Human judgment stays in the loop for approvals and merges, but the mechanical work of evidence gathering, root cause identification, and fix proposal happens autonomously.
The remediation cycle
The value of autonomous remediation comes from connecting stages that traditionally involve different people, tools, and context switches. The first three stages are handled by monitoring and investigative teammates. Stages four and five are where the Software Engineer Teammate picks up the work. Each stage builds on context from the previous one, and the full chain can run without human prompting.
- Detect. A pattern anomaly monitor, event connector, or manual report surfaces an issue.
- Investigate. OnCall AI delegates to the right specialized teammates, who query logs, metrics, and traces to build a timeline.
- Identify root cause. Teammates correlate the anomaly with recent changes (pull requests, deployments, configuration updates) to narrow the cause.
- Analyze code. The Software Engineer Teammate clones the relevant repository into the sandbox, reads the codebase, and pinpoints the offending change.
- Propose a fix. The Software Engineer Teammate writes a code change in the sandbox, runs available tests locally, and opens a pull request through the GitHub connector.
- Verify. After the fix is deployed, teammates monitor production telemetry to confirm the issue is resolved.
flowchart LR
A["Detect"] --> B["Investigate"]
B --> C["Identify
root cause"]
C --> D["Analyze code
(sandbox)"]
D --> E["Propose fix
(PR)"]
E --> F["Verify"]
F -.->|"new issue"| A
Why the sandbox matters
The sandbox is an isolated virtual machine provisioned on demand for AI teammates. It provides file system access, bash commands, and the ability to write, compile, and run code. Three properties make it the foundation of autonomous remediation.
- Deep code understanding. The Software Engineer Teammate clones entire repositories and navigates the full codebase rather than relying on code snippets passed through the model context window. This means they can trace call paths, understand dependencies, and evaluate changes in their full context.
- Token efficiency. Large codebases and API responses are downloaded to the sandbox file system instead of being loaded into the Software Engineer Teammate’s context window. This substantially reduces token consumption for investigations that involve large repositories or verbose API responses.
- Self-healing execution. Teammates can retry unreliable API calls, write Python or Bash scripts to analyze large datasets locally, and handle errors programmatically. If an API response is too large to fit in context, the teammate downloads it to disk and processes it with a script. If an endpoint is intermittent, the teammate writes retry logic. This makes investigations more reliable and less dependent on perfect external conditions.
How memories accelerate investigations
During investigations, teammates draw on two types of stored knowledge.
- Organization-wide memories come from previous analysis findings. When a teammate encounters a pattern it has investigated before, it retrieves those findings to accelerate root cause identification. These memories appear alongside the analysis in the investigation thread.
- Personal memories reflect individual user preferences, such as preferred repositories, communication style, or prior decisions.
Both memory types are configurable. You can toggle organization-wide and personal memories independently and set retention policies in Settings.
Comparison with traditional approaches
| Capability | Traditional workflow | AI Team with sandbox |
|---|---|---|
| Investigation start | Manual triage after alert | Automatic, triggered by monitor or event |
| Code analysis | Engineer reads diffs in a browser | Teammate clones repo, reads full codebase |
| Fix proposal | Engineer writes code locally | Teammate writes and tests fix in sandbox |
| PR creation | Manual | Automated, pending human approval |
| Context overhead | Full codebase in context window | Codebase on local file system |
| Error recovery during analysis | Manual retry | Automated retry with local fallback |
When autonomous remediation applies
The full cycle is most valuable for issues with a clear code-level cause.
- Regression bugs traceable to a recent pull request
- Configuration errors in infrastructure-as-code
- CI/CD pipeline failures with identifiable root causes
- Security vulnerabilities that require code-level fixes
Human approval remains in the loop for PR merges and production deployments. Teammates propose changes but do not merge or deploy without explicit consent.
Related resources
- Sandbox for practical details on the execution environment
- AI Team Fundamentals for the broader AI Team architecture
- Model Context Protocol in Edge Delta for how MCP connects context and automations
- GitHub Connector for PR operations available to teammates
- Software Engineer Teammate for the specialized teammate that handles code analysis and fix proposals
- Security Best Practices for permission models and approval workflows