dtannen

Posted on Apr 28

I Made a CLI That Yells at Your Code Until It Gets an A

#ai #cli #refactorit #opensource

AI coding is great until your repo starts looking like it was assembled during a fire drill.

So I made this:

npx fix-hairball

It reviews your codebase, gives it a grade, fixes the worst parts, reviews it again, and keeps going until it gets an A.

Basically:

D -> C -> B -> A

but with more terminal output and fewer feelings.

Why?

Because AI agents love writing code.

They do not always love deleting code.

They will happily create:

a helper for the helper
a compatibility shim for code written 11 minutes ago
a 700-line test file
three “shared” abstractions used once
a function named like it has a mortgage

fix-hairball exists to run the cleanup loop on purpose.

What It Does

Under the hood, it runs:

npx commands-com quality --until A

It asks multiple AI reviewers what is wrong, synthesizes the useful complaints, splits the fixes into parallel tasks, applies them, runs checks, then does it again.

Very glamorous.

Mostly it deletes things.

The Philosophy

If code can be clean in 50 lines, it should not be 200.

If an abstraction exists only because yesterday’s abstraction got lonely, it should go.

If the repo has “legacy compatibility” for something created this morning, everybody needs a walk.

Try It

npx fix-hairball

It will not make you a better engineer.

But it may make your repo look like one was involved.

Top comments (1)

PEACEBINFLOW • May 4

The observation that AI agents love writing code but don't love deleting it is the kind of thing that sounds like a joke but is actually a pretty deep asymmetry. Deletion requires confidence that something is truly unnecessary — that it's not load-bearing, not relied upon by something else, not a subtle edge case waiting to become a production incident. That confidence usually comes from understanding the entire system, which is exactly what a stateless agent doesn't have.

Writing new code is the easy direction. You can add a helper function without fully understanding everything it touches, because the compiler will tell you if the types don't match and the tests will tell you if the behavior changed. Deleting code has no equivalent safety net. The compiler won't warn you that a function you removed was called dynamically somewhere. The tests might pass because the deleted code was never properly covered. The blast radius of a bad deletion is larger and quieter than a bad addition.

What's interesting about fix-hairball is that it externalizes that judgment — it uses multiple reviewers to build consensus about what's safe to remove, which is a way of compensating for the fact that no single agent has system-level understanding. It's deletion by committee. I wonder if the multiple-reviewer step is actually doing the heavy lifting here, or if most of the wins come from just having a loop that explicitly asks "what should be deleted" — a question most AI coding workflows never pose at all.