Karpathy is right: agent code is messy, and you need guardrails
The hot take is easy: "agents write bad code."
The useful take is harder: agents write exactly the code you made easiest to produce.
Karpathy touched a nerve today because a lot of developers are living the same loop:
- you ask for a small change
- you get a blob of abstractions you did not request
- the agent copies and pastes blocks instead of refactoring
- the aesthetics are off, the boundaries leak, and you spend your time cleaning
People read that as a model problem. It's also a workflow problem.
What is actually breaking
1) Instructions do not survive contact with a real repo
Agents can follow style guidance in a toy file. In a real codebase, they drift. They reach for the shortest path that passes a quick sanity check, not the path that preserves your architecture.
2) Agents optimize for completion, not stewardship
They will happily inflate the surface area if it reduces reasoning difficulty. You see it as bloat. The model sees it as risk reduction.
3) Copy paste is a symptom
Copy paste shows up when the agent cannot confidently locate the right abstraction boundary. So it makes a new one, then duplicates it elsewhere.
The fix is not "better prompts"
If you want agents to write clean code, treat your repo like a constrained environment.
Guardrail stack that actually works
-
Make formatting non negotiable
- autoformat on save and on CI
- reject diffs that do not pass
-
Lint driven development
- write lint rules that encode taste
- let the linter be the supervisor
-
Small diffs only
- cap file count
- cap lines changed
- force incremental PR sized patches
-
Force intermediate variables
- one liners look clever and age badly
- name the idea, then move on
-
Test first or the agent will hallucinate correctness
- even a minimal regression test changes the agent's search space
-
Preflight the codebase for agents
- kill dead code
- tighten types
- remove ambiguous patterns
- document the one true way to do the common things
Where this ends
We are heading toward a split:
- hand crafted code for the parts that carry risk
- agent generated code for the parts that carry churn
If you do not install guardrails, you get a third category: slop that compiles.
The win is not "agents write beautiful code." The win is you spend less time reviewing and more time choosing what matters.