The HITL Playbook: How Governance Turned an AI Coding Agent into a 51-Minute Incident Response Team
A WordPress site started throwing 500 errors at 09:04. Monitoring caught it at the same moment a report came in of multiple pages failing. Two independent signals, same minute. The first action was opening the site in a browser — trust the alerts, but confirm them. The site was genuinely down.
This post is about what happened next. Not the headline — the mechanics. Because “AI-assisted incident response” is a phrase that gets used loosely, and the difference between using it loosely and using it operationally is the difference between “the AI wrote some code suggestions” and “the incident was contained in 51 minutes and signed off in three and a half hours.” The first is a convenience feature. The second requires a governance architecture that exists before the incident, not one improvised during it.
The specific setup throughout this post: a Claude Code session with a CLAUDE.md governance file scoped to the task, able to spawn subagents for parallel work streams, with every consequential action requiring my explicit approval before executing. That’s what “the agent” refers to — not an autonomous robot, but a task-scoped instance under governance constraints and human-in-the-loop control.
What follows is the Tuesday morning, in the order it happened.
The parallel workflow before anyone looks at the problem
There’s an instinct under incident pressure to go serial. Alert arrives, first instinct, first action, result, second action, result. One thing at a time. The problem is that “one thing at a time” compresses poorly into 51 minutes when the thing that’s wrong has been in place for eight days and needs to be understood simultaneously from four different angles.
The first two minutes were structured to avoid that compression. I opened an agent task (Claude Code, task-scoped governance document already in place) and while it loaded its governance context, I SSH’d to the Lightsail instance and started inspecting the WordPress install directly. The agent’s spool-up isn’t free — it’s reading its CLAUDE.md, loading the task-specific governance, and establishing the context it’s going to work from. Those 90 seconds are dead time if you wait for them. They’re not dead time if you use them to start your own investigation.
By the time the agent was ready to receive instructions, I had something to give it.
The tell
wpconsole.php.
That’s what I saw first — a file in the WordPress directory with a name designed to look like it belonged. It didn’t. There is no file called wpconsole.php in the WordPress core. There were several others with similar naming conventions — wp- prefixes and innocuous-sounding stems positioned alongside real core files, relying on a tired operator scrolling past them without stopping.
An attacker who wants persistent access on a WordPress installation has two options: put payloads where admins don’t look, or put them where admins look but name them so they don’t register as anomalous. The second is harder to spot. It depends on the operator having seen enough real core files to know what doesn’t belong — which is the specific class of pattern-match expertise you can’t get from a scanner.
That recognition was the pivot. This wasn’t a bug. It was the tail end of a compromise that had only become visible because the attacker had made a coding error that crashed a payload, which surfaced the HTTP 500 errors that triggered the alerts.
The handoff
I briefed the agent with what I had: the specific files that looked wrong, the hypothesis that this was a compromise, and the posture I wanted us to operate from — contain first, understand second. Every server-side action required my approval before executing. HITL is a hard rule for production infrastructure.
The agent’s first move wasn’t to verify my diagnosis by repeating what I’d done. Per the task governance, it spawned a subagent to pull server logs in parallel while the main thread began an independent crawl of the site for additional indicators. I hadn’t needed to tell it to — the governance prescribed that workflow, and the agent reached for it reflexively.
Why the agent did that without being asked
The task-scoped CLAUDE.md that this agent loaded includes specific instructions for security scans: use subagents for parallelizable investigation streams, keep the main thread open for triage and operator conversation. When a scan starts, the agent reaches for a subagent the same way an experienced human incident responder reaches for tail -f on the webserver logs.
The governance is a design choice I made before the incident, not a prompt I improvised during it. Under pressure, nobody invents good structure — everyone falls back on whatever structure already exists, and in most organizations running AI coding agents, that’s no structure at all.
What the verification found
Within roughly eight minutes of the handoff, the agent and subagent came back with a fuller picture.
The compromise had been confirmed. So had my initial diagnosis. But the logs told a story my initial terminal work hadn’t — the attacker had established access roughly a week earlier. They had been dormant for eight days. No files had been modified in that window. The attack only became visible on the Tuesday morning because the attacker began actively deploying payloads and made a coding error that crashed one of them.
The silent-compromise phase is the part that matters for thinking about defense. For eight days, a site under active remote control looked identical, to every automated scanner, to a site that was fine. The “attacker has my admin credential” phase doesn’t leave a malware signature. It leaves login events that look like regular login events.
More detail came with the log pull: the brute-force attack that had cracked the admin credential had used a pattern clever enough to avoid triggering the fail2ban hard-jail threshold. Specifically, it paced itself to stay under the rate-limit bans and retried over three days. Automated, patient, cheap. That specific pathway was closed during remediation, and additional controls were layered in to prevent the same vector from working again.
The agent and subagent also surfaced persistence vectors I’d need to remediate: mu-plugin modifications, cron-job entries designed to re-deploy payloads from database-stored blobs, session-key artifacts that would survive a password rotation. Five persistence vectors in total.
The 51 minutes
Remediation ran in two passes. First pass: contain. Eliminate every persistence vector, in priority order. Restore clean state. Close the specific exploitation path. Second pass: forensic. Understand how this happened in enough detail that the prevention measures address the actual vector, not a guess at it.
The sequencing is textbook incident response. Stop the bleeding before trying to understand why the patient is bleeding. What made the AI-assisted version possible wasn’t speed — it was the two passes running without either one shortcutting the other. Under solo-operator pressure, there’s constant temptation to merge them: fix a thing, look at a thing, fix another thing. The result is usually a remediation that’s partial because you got curious partway through, and a forensic picture that’s incomplete because you remediated away some of the evidence.
The agent and its subagents handled parallel work streams during the first pass — file integrity checks, database cleanup, mu-plugin removal, session invalidation — each one requiring my approval before executing. I was not hands-on-keyboard for most of the 51 minutes. I was approving, redirecting, asking clarifying questions, and making judgment calls on sequencing. Roughly seventy percent of the window was conversation with the primary agent, not console work.
If AI-assisted incident response meant “AI does what I would have done, faster,” the proposition would be weaker than it actually is. What it really means is that the operator becomes the judgment layer exclusively, and the agent and subagents become the execution layer. The human does the work a human is good at — recognizing patterns, making priority calls, catching inference chains that look right but aren’t. The agent does the work an agent is good at — parallel execution, systematic enumeration, patient checksum comparison.
The HITL anchor: the redirect side
During the forensic pass, the agent flagged a mu-plugin that had been updated simultaneously across every site in the fleet. Its name looked suspicious. The agent’s hypothesis was that this was the attack vector, now applied fleet-wide. It wanted to delete the file and sanitize every site.
That didn’t sit right with me.
The timing of the update was off — out of alignment with everything else we had seen from the attacker. And something about the filename looked important, not malicious. I couldn’t have told you exactly what it was off the top of my head. I could tell you the inference chain had moved too fast.
Instead of deleting, I asked the agent to dump the file’s contents.
It was part of a fleet management utility. Completely legitimate. If the agent had proceeded, we’d have spent the rest of the day reconfiguring every site in the fleet management console by hand — honestly, likely a multi-day recovery, worse than the original incident.
The agent wasn’t stupid. It had a plausible theory with plausible evidence. My intervention wasn’t based on information it lacked — it was based on pattern-match intuition built up from years of incident work. That’s what HITL is actually for: not approving rubber-stamp decisions, but catching the confident-but-wrong inference chain before it becomes irreversible. Inspecting a suspicious file is almost free. Deleting one and being wrong — when it turns out to be a fleet-wide utility — isn’t.
The HITL anchor: the catch side
The mu-plugin moment is the story most people tell about HITL because it’s the dramatic one — but it’s only half of what HITL does.
One of the five persistence vectors the agent surfaced was embedded in an actual WordPress core file. Not a suspicious-looking impostor with a wp-like name — a legitimate core file modified in a way a tired operator scrolling through search results would have scrolled past. The agent caught it through a systematic checksum comparison against the known-good core files for that version. I would almost certainly not have found that one on a manual pass. Checksums aren’t something a solo operator does under pressure; they’re something a solo operator intends to do and skips because the obvious vectors are already taking up all available attention.
So HITL works both directions. The agent catches things the operator would miss because it runs systematic comparisons the operator wouldn’t have time for. The operator catches things the agent would miss because pattern-match intuition is not yet a reliable part of the agent’s toolkit. Case studies that tell the story in only one direction are underselling what this architecture actually is.
The AAR that found a second site
Containment finished at 09:55. First AAR was posted twenty-one minutes later, at 10:16.
Writing an AAR under the discipline of “this has to be consultable six months from now” is a retrieval mechanism. The AAR doesn’t just document what happened — it forces the author to look at the artifacts from a different angle than live remediation allows. You can’t write down what you found without noticing what you haven’t yet explained. And what I hadn’t yet explained, in the first AAR draft, was a specific inconsistency in the forensic evidence that only surfaced when I had to write it into a sentence.
That inconsistency was enough to raise the possibility that the attacker’s access method might extend beyond the single site. Rather than stop at the incident boundary, I instructed the agent to audit every site in the fleet for the same pattern of indicators. It found one. A second compromised site that would never have surfaced from a single-incident investigation alone.
The second site was staged, not deployed. The attacker had access and had laid the groundwork, but had not yet modified files. Architecture helped here: the site’s integrations were one-way push-only, meaning the staged access had no sensitive data to extract. Remediation ran on the same two-pass workflow. It moved faster because the pattern was already understood.
An exhausted solo operator does not write a careful AAR and then proactively audit a fleet. That kind of work is exactly what gets dropped when the operator is running on empty. The reason it happened here is that the 51-minute containment hadn’t consumed the operator’s capacity. The agent scaled my judgment across more work than I could have covered alone.
What the scanners couldn’t catch
Both servers were running maldet. Neither attack was caught by it.
The defaced site might have been caught by the evening’s scheduled scan, once the active deployment phase began. But the silent-compromise phase — the one that actually established persistence over eight days — had no signature for a scanner to match. On the second site, the attacker had established access but hadn’t yet deployed payloads. Access without deployment doesn’t leave a malware signature for a scanner to match. It looks like normal authorized use.
Commercial scanners catch malware signatures. They don’t catch credential-based access or silent staging. A site owner running standard defensive tooling would have seen nothing wrong with either compromise until the attacker chose to make noise. This is a structural limitation, not a configuration gap — “buy a better scanner” doesn’t solve it, because the class of attack that actually happened here is the class that scanners, by design, can’t see.
The counterfactual
A solo operator handling the same incident without an AI-assisted workflow runs into four problems, each of which independently breaks the timeline.
Containment alone expands to a full working day or more. Hunting five persistence vectors manually across file integrity, mu-plugins, cron jobs, database options, and session state, in priority order, is not a 51-minute job without parallelism you don’t have. Even with dedicated commercial WordPress cleanup tools, one persistence vector typically survives a manual cleanup. The attacker returns. Remediation starts over, usually under worse conditions because trust in the cleanup is now gone.
The forensic pass gets truncated. Under pressure, with limited hands and limited time, the “understand how this happened” phase loses ground to the “get back online” phase. The vector that allowed the attack doesn’t get fully characterized, which means the prevention measures address a guess at the vector, not the actual one. Three months later the same class of attack returns via a slightly different door.
The fleet audit doesn’t happen. By the time the first site is cleaned and the AAR is written, a solo operator is exhausted. The proactive audit — the work that caught the second compromise — is exactly the kind of low-urgency, high-judgment task that gets dropped when the operator is running on empty. Without it, the staged compromise on the second site stays silent until the attacker chooses to act.
Verification stays unreliable. Checksums and systematic file-integrity comparisons — the work that tells you whether you’re actually done — are the first things to get skipped when a depleted operator needs to declare the incident closed. “Remediation complete” becomes “I think we’re done.” Sites that look clean to an exhausted operator stay compromised. The next incident, if it comes, looks like a new one.
The shape of the work
AI-assisted incident response changes what the operator’s time is for.
In the pre-agent version of this work, a senior operator spent most of their incident time on execution: running commands, reading logs, writing grep pipelines, watching output. The judgment calls — priority ordering, pattern recognition, redirecting confidently-wrong inference chains, deciding what to audit next — happened in whatever cognitive bandwidth was left over. Often there wasn’t much. And the governance work, the post-incident documentation, the proactive blast-radius hunts — those happened only if the operator had capacity after the execution work was done. Which, typically, they didn’t.
In the agent-assisted version, execution is delegated to a team of agents operating under task-scoped governance and HITL approval. The operator’s time is concentrated on judgment, because that’s the part that can’t be delegated. And because judgment isn’t exhausted by execution work, there’s capacity left over for the discipline that separates containment from actual resolution.
The fleet-wide audit was the kind of work I would have known I needed to do, and would have been unlikely to ever get to. Work like that requires a team, or it requires not being exhausted — and in the solo version of that Tuesday morning, I wouldn’t have had either.
That’s the thesis. AI augmentation restores an operator’s capacity to do the work that would otherwise get dropped when the load is already saturated — the audits, the verification passes, the proactive blast-radius hunts that a depleted solo operator would never get to.
I use agentic workflows to extend my own surface area. That’s what lets me push past “I think I found the problem” into the proactive audits that actually catch what would have been missed — at a speed a human team can’t match on its own.
The case study on this incident is the shorter version of this story, anchored on the headline numbers. This post is the longer version, anchored on the mechanics. If you want the architectural context for how the agent-governance pattern works across more than one incident, the sysadmin-claude case study describes the broader design.
Related reading: How technical keystones burn out explores why single-operator fleets are structurally fragile to begin with — a fragility AI augmentation can help with, but not erase.
Or get new posts in your inbox
Occasional writing on systems, ADHD, and AI. No cadence pressure.
You're in. I'll send you the next one.