What Scrubby Can't Do

Earlier this week, The Guardian reported that a Claude-powered agent deleted a company’s production database, and the backups, in a single autonomous run. This is not the first incident of its kind. A widely-shared Substack post from Alexey walked through a similar story last year, where an agent issued a destructive SQL command against a live database that it should never have had credentials for.

We sell a tool called Scrubby, which, as you know, is designed to make AI agents better at writing code. So in conversations about these news stories with some colleagues, a question kept coming up: would Scrubby have prevented this?

The honest answer is no, not directly. I’d rather say that out loud than imply otherwise, because the gap between what Scrubby is and what those incidents needed is important. If we let people believe codebase intelligence is a kill switch for agent disasters, the next disaster lands harder and people trust Scrubby and codebase intelligence in general less. No thank you ma’am.

So, this post is about where the line between Scrubby’s capabilities and value and the type of headline-grabbing stories we’ve seen recently actually is.

The TL;DR of happened in those incidents

In both stories, an AI agent had direct, unsupervised access to a production database, executed a destructive command (a DROP, a destructive migration, or equivalent), and shit hit the fan. There was no human-in-the-loop confirmation, no environment isolation, and no separation between the primary data and the backups.

So the reality of this story and others like it is that those were operational and infrastructure failures, not buggy code or agents violating a team’s set conventions. The responsible agent executed an action that nothing in the surrounding system was set up to prevent in the first place.

Why Scrubby wouldn’t have caught this

Scrubby operates one layer up from the runtime. It reviews code that’s about to be written, edited, or merged. It models domains, conventions, co-change patterns, and architectural boundaries. The questions it answers look like this:

Does this new controller follow the same authorization pattern as the other 49 controllers in this domain?
This model has been changed 23 times in 18 months, and 21 of those changes also touched its serializer. Why isn’t the serializer in this changeset?
This new service was added at the top level. Similar services in this codebase consistently live under app/services/<domain>/.

Those are useful questions, but they are not questions that would prevent DROP DATABASE prod. Scrubby does not sit between the agent and the shell. Scrubby does not gate access to production. Scrubby does not require human approval for destructive operations. Those are different controls, and they live in a different part of your stack.

What actually prevents these incidents

If you’re worried about an agent deleting your database (and after this week, you should be), the controls that actually matter are:

Don’t let agents hold production credentials. Give them a staging clone with realistic but recoverable data. The number of agent tasks that truly need to touch production directly is much smaller than the number that get production access by default.
Require human approval for destructive operations. This is straight out of the Anthropic guidance for Claude Code itself (and a pretty obvious safety practice y’all should be observing already): actions that are hard to reverse or have a large blast radius should pause for confirmation. rm -rf, DROP, force-push, dropping branches, and destructive migrations all fall here.
Keep backups in a separate trust boundary from the primary. If the same set of credentials can wipe both, they aren’t backups, they’re a redundant copy of the same blast radius.
Use sandboxed permission modes by default. Most agent harnesses now ship with permission tiers. The default for autonomous runs should not allow shell or SQL access to production-tier systems.
Log every tool call. When something does go wrong, the post-mortem depends on knowing what the agent ran, in what order, and why. Tool-call logs are the audit trail.

None of that is Scrubby’s domain, but it’s all still very necessary!

What Scrubby is genuinely good at

With the limits stated, here’s where a codebase intelligence tool like Scrubby actually pulls its weight and why it’s still a non-negotiable layer of the AI dev stack:

Stopping convention drift before merge. When an agent writes a controller that’s missing the authorization concern every other controller in that domain includes, Scrubby flags it. That’s the class of bug that compiles, passes tests, and fails silently in production six weeks later when someone hits the wrong endpoint.

Catching missing co-changes. These types of bugs (model-without-serializer class, migration-without-down-migration class, route-without-test class) are caught by knowing what historically changes together, which is something only a tool with structured git-history knowledge (like Scrubby!) can answer.

Surfacing domain boundary crossings. When an agent reaches from one domain into another in a way the codebase doesn’t normally allow, Scrubby surfaces the crossing for review. Sometimes it’s intentional, but a lot of times the agent didn’t realize there’s a separation of those domains in place for a reason.

Grounding the agent’s first draft. Wired up over MCP, Scrubby gives the agent the conventions and architecture context it would otherwise have to guess at, making an agent’s first draft fit the codebase instead of needing three rounds of structural cleanup in review.

These are real, measurable wins that show up in PR review time, in defect rate, and in the kind of incidents that don’t happen because the slightly-off code never made it past review. However, they are also, importantly, not a substitute for the operational guardrails I listed above.

How to think about Scrubby in your stack

The mental model we’d offer is the layered defense one.

Scrubby is the code-fit layer. It catches the class of failure where the code being shipped doesn’t match the codebase it’s shipping into.

Your agent permissioning and sandbox layer is what catches the class of failure where the agent tries to do something it shouldn’t be allowed to do at all. That’s a different layer, and it lives closer to the agent harness and your infrastructure.

Your backup, isolation, and recovery layer is what catches the class of failure where the previous two layers fail and you need to recover. That’s a different layer again, and it lives in your infrastructure team’s domain and is managed by your SRE.

Each layer catches things the others can’t and none of them are optional. If you’re running coding agents at scale and you only have one of the three, that’s a leak. We’d rather you fix the leak honestly than pay for Scrubby thinking it covers a gap it doesn’t.

What to do next?

If you’re using agents, audit the layered-defense picture for your own setup. Which of the three layers do you actually have in place, and where is there a gap that could lead to an outage, silent failures, bad UX, or (god forbid) you’re entire prod database getting deleted by an agent a little too high on his own supply.

If the gap is at the code-fit layer, Scrubby on every repo and an MCP server in every editor is the right intervention. If the gap is at the permissioning or backup layer, the right intervention is somewhere else, and we’d rather point you there than pretend we cover it. These agent-error stories are going to keep coming, and we’d rather build something that genuinely helps with the part it helps with and be loud about the parts it doesn’t than be one more vendor implying that buying our thing means you don’t have to think about the rest.

Sources:

Ready to give your AI agents full codebase context?

Join the Scrubby beta