System

How Scrubby Works

"Codebase intelligence" sounds abstract, and the value is largely invisible. Your AI agent stops generating code that violates conventions, your token bills drop, your PR reviews stop catching the same structural issues over and over. But what's the path from "I just installed this thing" to those outcomes? Here's a walkthrough, in the order it actually happens.

Step 1: Connect a Repository

The first thing you do is point Scrubby at a repository. There are two ways, and they're not mutually exclusive.

The GitHub App. You install Scrubby's GitHub App on your organization, pick the repos you want covered, and that's it. Every pull request from then on gets reviewed against your codebase's actual patterns, and Scrubby's understanding of your code stays current as commits land.

The MCP server. You configure your AI editor (Claude Code, Cursor, Windsurf, VS Code, Zed, or anything else that speaks Model Context Protocol) to connect to Scrubby. Now your AI agent can query Scrubby directly, in real time, while you work.

Most teams use both. The GitHub App reviews PRs at merge time. The MCP server prevents the bad PRs from ever being written in the first place.

Step 2: The First Index

When Scrubby first sees your repository, it does something no linter or static analysis tool does. It tries to understand your codebase as a structure of meaning rather than just a tree of files.

The repo gets scanned. Scrubby reads through your codebase and builds a picture of how the files relate to each other.
Domains get discovered. Scrubby identifies the architectural domains in your codebase ("Authentication", "Background Jobs", "Billing") and assigns each file to one. Real semantic understanding, not pattern matching on directory names.
Connections get built. Scrubby learns how your domains relate to each other, so changes in one area can be evaluated against the areas they typically affect.
Global knowledge gets applied. Scrubby maintains shared knowledge for ecosystems like React, Rails, Security, and Testing. The relevant pieces get applied to your repo automatically.
History gets considered. Recent change history is used to understand which files tend to move together. No author data, no PII, only what changed and when.
Domain activity gets tracked. Scrubby keeps a sense of where the codebase is currently active and how different areas evolve together.
A snapshot gets saved. An index snapshot points to a state in time in the codebase.

The first index typically takes a few minutes for a repo of moderate size. After that, it's incremental, so Scrubby only re-processes what changed.

Step 3: Conventions Get Extracted

Once domains exist, Scrubby looks at the code in each one and identifies the patterns your team uses — including the way one domain is meant to be used by another.

These aren't rules someone wrote down. They're patterns Scrubby observed across the actual code your team has been writing for months or years. So when an AI agent asks "how does this team structure a new service?" Scrubby has a real answer grounded in real examples.

Step 4: An AI Agent Asks a Question

Your AI editor is now configured with the Scrubby MCP server. You open a file and ask it to add a feature. What happens?

Before generating code, the agent asks Scrubby for context — what the file does, what domain it belongs to, what conventions apply, and what other parts of the codebase tend to be involved when this area changes.

The agent reads this and incorporates it into the code it generates. The result is code that fits, because the agent now has the context it was previously missing.

Step 5: A Pull Request Opens

You push your branch and open a PR. Scrubby picks it up automatically and starts a review.

A check shows up on the PR so you can see review is in progress.
The changed files get loaded along with the relevant context from your repository's index.
Each file is reviewed against the domain it belongs to and the other domains that tend to be affected when this area changes.
Findings are gathered, deduplicated, and ranked by severity.
A single PR comment is posted (or updated, if Scrubby has already commented on this PR).

The comment doesn't read like a generic AI reviewer. It reads like a senior engineer on your team, because it's grounded in your team's actual patterns: "This new endpoint is missing the corresponding spec file. Endpoints in this domain consistently get tests alongside them."

Step 6: The Network Learns

This is the part that's easy to miss. Every analysis Scrubby does is a learning event.

When a connection between two areas of your codebase produces a useful finding, that connection gets reinforced. When it runs and finds nothing, it gets weakened. This is Hebbian learning: neurons that fire together wire together. Over time, the connections that actually matter for your codebase strengthen, and irrelevant noise fades.

The same idea applies to shared knowledge across repos: when a pattern proves useful broadly, it strengthens for everyone, and patterns that stop being relevant fade out over time. This is what we mean when we say Scrubby gets smarter over time. The phrase describes the literal behavior of the system.

Step 7: Subsequent Indexes Are Cheap

When new commits land, Scrubby doesn't re-index from scratch. It only revisits what's changed, and only re-evaluates the bigger picture when the codebase has shifted in a meaningful way.

This is what makes Scrubby practical to run on real codebases at real velocity. You don't pay a re-indexing tax every time someone merges a PR.

What Comes Out the Other Side

After a few days of normal development, your team has a Scrubby that understands your repository roughly the way a senior engineer does. It knows the domains, the connections between them, the conventions in each, and the patterns of how your code actually changes. That knowledge is queryable through the MCP server, applied automatically on every PR, and keeps refining itself as your code evolves.

The next decade of software engineering depends on AI agents being grounded in the codebases they work on. Generic intelligence is no longer enough. The agents that produce real value are the ones that are demonstrably aware of the codebase they're touching.

Connect a repo, let it index, and watch your AI agents get noticeably better at the work they do for you.

Ready to give your AI agents full codebase context?