Scrubby

How Scrubby Works

Three layers of understanding — domains, segments, and history — built automatically and refined as your code evolves.

Scrubby builds three layers of understanding about your codebase, each feeding into the next. This page walks through what each layer captures and how they combine into the codebase intelligence used by every PR review and every MCP tool call.

Layer 1: Domain discovery

When you connect a repository, Scrubby scans your code and identifies the architectural domains that make up your system. These aren’t generic labels — they’re discovered from your actual code structure, file relationships, naming patterns, and a sample of file contents.

For example, a Rails app might have domains like “Authentication & Authorization”, “Billing & Subscriptions”, “Background Jobs”, and “Neural SME Engine”. Each domain captures a cohesive area of responsibility with its own patterns and conventions.

Domains are connected by weighted edges that represent how tightly coupled different areas are. When you change auth code, Scrubby knows it might affect the API layer and the dashboard. The weights are not hand-tuned — they emerge from cross-domain imports and co-change history.

See Domains for the full mechanics, including how global domains (Ruby, React, Testing, Security) layer on top of repo-specific ones.

Layer 2: Segment analysis

Within each domain, Scrubby identifies segments — clusters of files that form cohesive modules based on import relationships, naming patterns, and directory structure. A segment is finer-grained than a domain: a “Billing” domain might contain segments for “Stripe webhook handling”, “subscription model”, and “invoice generation”.

Each segment gets its own conventions extracted by local pattern analyzers that detect:

  • Naming patterns — snake_case vs camelCase, class naming, file naming.
  • Import organization — absolute vs relative, barrel files, autoloading.
  • Code structure — file/method length, public/private separation.
  • Error handling — specific vs generic rescue, custom error classes.
  • Testing patterns — framework, factory usage, mocking style.
  • Documentation — comment density, docstring style.
  • State management and API design — how state flows and endpoints are shaped.

Conventions are stored with confidence scores so high-confidence patterns are prioritized over edge cases. See Conventions for the full list and how scores are computed.

Layer 3: History & co-change

Scrubby analyzes your git history to learn which files actually change together. If user.rb and user_spec.rb have changed together in more than half of commits, Scrubby will flag it when you change one without the other.

This is the foundation for the “files you may have forgotten” feature in PR reviews. It catches the kind of bugs that slip through file-level review — the migration without a model update, the API change without a client update, the controller change without a route update.

See Co-Change Analysis for the threshold and how the model improves with more history.

Putting it together

When you ask Scrubby to review a file, it combines all three layers:

  1. Which domain does this file belong to? What are that domain’s patterns?
  2. Which segment is it in? What conventions apply?
  3. What files usually change with it? What’s the git history context?

This is why Scrubby’s reviews are different from a linter — they’re informed by your team’s actual practices, not generic rules. The same three layers feed every MCP tool call: scrubby_review returns layer 1+2 context for a file, scrubby_get_network exposes layer 1’s connection graph, scrubby_review_changeset checks layer 3’s co-change pairs.

How the system improves over time

This means Scrubby’s reviews get sharper the more you use it. Convention extraction also re-runs on incremental indexes when significant changes land, so patterns stay current as your codebase evolves.

For the deeper mechanics, see Findings & the Learning Loop.

Last updated