ResolveAI
A cross-surface AI SRE workflow for production incidents.
- Year
- 2025
- Role
- Product Designer
- Tags
- AI SREAgenticDevToolsB2B
An AI SRE that works the incident alongside on-call engineers.
ResolveAI is an enterprise AI SRE agent. It reads across code, infrastructure, logs, metrics, and live systems to catch incidents, propose causes, and help engineers get from alert to fix.
An AI SRE that works the incident alongside on-call engineers.
ResolveAI is an enterprise AI SRE agent. It reads across code, infrastructure, logs, metrics, and live systems to catch incidents, propose causes, and help engineers get from alert to fix.

On-call starts in Slack, because that's where the page lands. The first decision was to keep the interruption small: the alert, the agent's first read, and one next action, not a full report dumped into the thread.

When the work shifts from reacting to digging in, it moves to the web app. The report becomes a workspace: status, theories, evidence, root cause, and open questions each get their own place, so engineers can audit the reasoning instead of scrolling one long block of text.

The hard part is trust: out of several theories, which one should an engineer act on? I built the triage view around ranked hypotheses, each carrying its evidence count, impact, and confidence, with a direct path into the full investigation.

The work doesn't end when the fire's out. After a deploy, the agent keeps watching production, flags regressions, traces likely code paths, and drafts the remediation. The rule is firm: it can suggest and prepare, but anything that touches production waits for a human to approve it.

The last surface is shared. Engineers challenge the agent, ask follow-ups, compare evidence, and decide the next move together. I defined the Slack-to-web flow, designed the core UI states, prototyped the Slack blocks, and shipped the first version to enterprise customers.
