scrubby_similar_files
Find files semantically similar to a given file via embeddings — scoped to this repo AND any cross-repo inclusions configured for it.
Finds files semantically nearest to a given file via pgvector cosine, scoped to the source repo and any peers it’s opted to include via Cross-Repo Inclusions. This is one of the few Scrubby tools that can cross repository boundaries, so it’s the tool to reach for when the question is “is this pattern already implemented somewhere?”
Parameters
How it works
- Looks up the file’s embedding (generated during indexing via Jina).
- Builds the allowed-repo scope: the source repo, plus every repo in its inclusion list.
- Runs cosine nearest-neighbors against that scope, excludes the source file itself, and returns the top results ranked by similarity.
The org boundary is enforced by the inclusion constraint — cross-org repos cannot be in the inclusion list, so they cannot appear in results.
Response
Returns a list of hits grouped by repository, each with:
path— the file’s relative path within its repo.repository_name— which repo the hit came from. Hits from the source repo and from included peers are visually distinct in the rendered output.similarity— cosine similarity in[-1, 1], rounded to 4dp.summary— the file’s Scrubby-generated summary, when available.language— the file’s language.
Typical usage
Before editing a file, to find existing implementations of the same pattern:
"Before I rewrite this webhook handler, run scrubby_similar_files on app/webhooks/stripe_handler.rb."
To check whether a fix you’re about to apply already needs to be applied elsewhere:
"I'm fixing a race condition in `Order#mark_paid`. Run scrubby_similar_files to find any sibling implementations."
When NOT to use
- Single-file structural questions — for “what calls this function?” the call-graph and import-graph tools (or your editor’s native ones) are more precise than embeddings.
- Generic boilerplate — “how do I parse JSON?” will surface noise because everything looks similar to everything. Use this when the file you’re starting from is meaningfully specific.
Errors
| Code | Meaning |
|---|---|
file_not_found | The file_path doesn’t match an indexed file. Commit and push, then re-run scrubby_index. |
no_embedding | The file is indexed but has no embedding yet. Re-run scrubby_index to generate one. |
repo_not_indexed | The source repo hasn’t been indexed yet. |
not_authenticated | OAuth session expired. |
Last updated