Some people can get an AI assistant to write a day’s worth of useful code in ten minutes. Others among us can only watch it crank out hundreds of lines of crap that never works. What’s the difference?
There are some skills specific to AI development. There are also properties of the codebase we’re working in that make it amenable to AI assistance. Most AI demos use projects created from scratch with AI in mind—cute. Most valuable software development happens in legacy projects that have existed far longer than ChatGPT or Claude. Even developers with AI coding experience can be slower with AI than without. How can we get value from AI assistants in real-world development?
This article lists four codebase qualities that lead to AI-assisted productivity, along with one surprise bonus.
Qualities of an AI-compatible codebase
Quote: Do you have a nice loosely coupled modular codebase supported by lots of tests?
My current running hypothesis is if so, coding assistants add value, otherwise they’re not much use beyond simple tasks.
– Rob Bowley
AI needs to know what good looks like, and then it needs ways to check its work and fix it.
Code is organized and consistent
We want the AI to write code that looks like our best code.
It needs to know: what part of the codebase is exemplary for this new code? If all the code is consistent in style, library versions, testing, etc. then the AI is good at following these patterns. That’s unrealistic in a large codebase, as architectural and style direction changes as people learn and as people leave. More realistic: clear documentation of where everything belongs and what we want it to look like.
When a codebase contains both modern and historical code patterns, whoever implements new code needs to know which one to emulate. For example, at Honeycomb, we implemented a Schema Store service that caches the generated schemas for customer datasets. Before that, code everywhere went straight to the database. For months and years afterward, some code still went straight to the database. The people who wrote Schema Store socialized the new direction to all the other teams, and tenured developers still pass this direction on to new ones.
The AI agent doesn’t have this background. Documentation can tell an AI which code embodies the current practices, so it knows what pattern to follow. You can add a comment to the old code: mark it deprecated and point to an implementation that is up-to-date.
Make a style guide for AI, in a rules.md or similar. Tell it which patterns are current. Point it to the latest examples, and tell it what to avoid. Refer to files or include a good code example right there.
## Schema Access
When retrieving schema information, avoid using types/schema.go, which is deprecated.
Instead, use the Schema Store, as in poodle/query_editor/autocomplete.go, function `fetch_fields`
When this gets out of date, have the agent update it for you—it’s fast!
Consistency becomes more important and easier to get: AI can help us make the codebase consistent. At Honeycomb, we didn’t spend months refactoring every single place to use the Schema Store before moving on to the next urgent work. If we had AI assistance at the time, those months could be days, and days could be hours! We can now maintain a level of consistency we couldn’t before. And when we do, both AI and human coders benefit. For example, when the Schema Store is used everywhere, we can safely change the database structure. New options are open to us.
The whole codebase is part of the context that the AI can see, which makes it part of the instructions. Clear instructions help it get closer to what you want, but even more important is: how can it know when it’s wrong? How can the AI know its work is acceptable? It can’t (that’s my job), but it has some ways to know when the work is not sufficient.
Use static type checking
Strong type checks help AI work better. The more problems a compiler finds, the more the AI can fix before it reports back.
In a dynamic language, add type hints and definitions. This gives it a better chance of getting code right the first time, because there is more information, which will help compilers find mistakes. Bonus: this also helps humans who didn’t write or don’t remember this code.
Static typing is more important now, and less work. I loved dynamic languages because I could add data in one place and use it another without changing code everywhere in between. Now, AI can make updates everywhere in between, and get it mostly right, then the compiler errors will nudge it the rest of the way there. It won’t even get frustrated by them like I do. Explicit structure helps the AI, and it helps new members of the team get that context. People like clues too!
Remind the AI to run all the checks after it writes the code. Remind it in multiple places.
- Put all the development steps in README.md
- Tell it to run each step after every change in RULES.md (or whatever model-specific file)
- You’ll still have to prompt it to do it half the time
- And then remind it again when it skips one
- Check rigorously in automated CI steps before PR review; coworkers don’t need to scour changes for bugs that can be caught in static analysis.
- Extreme measure: add a commit hook so that Git will error if it skips a step.
Automate style checks
Don’t waste your time checking style if a linter can do it. It’s faster, it’s deterministic, and it’s easier on relationships on the occasion that the code was written by a human.
Tightening linting rules requires changing a bunch of code that doesn’t currently conform. AI can help with that… or will, as it gets better. This also increases the consistency of the codebase. This was harder when we had to do this repetitive work by hand.
Remind the AI to run the linter in autocorrect mode. Don’t waste your time or your tokens on mechanical fixes.
Of course, even trivial changes are only fast when it’s fast to verify that everything still works.
Get serious about testing
The other day, while coding with an agent, I found myself deploying to kubernetes, clicking on a website to trigger an error, and looking in Honeycomb for the error message over and over, each time Claude Code told me it was fixed. If I’m manually operating the feedback loop while the AI does the fun part, something’s wrong!
My job is to design the feedback loop. AI can implement it with me. Then AI can run it over and over.
The two big questions about every code change: does it work? And, what else did I break?
When coding by hand, I can live with manually testing my changes a few times an hour. I might rebuild the app, restart a service, and click around in a browser.
The AI agent needs to do these checks a few times per minute. They have to be automated. The AI can’t go faster than its feedback loop of “does it work?” Don’t put yourself in that loop!
A shell script can do a deploy, click on a website (with Playwright), and look at the updated screen. It can check the database. It can check logs and traces, and give you a link to the trace in your observability platform. You can get the agent to write this script for you!
After it writes the script, run it yourself. Did it print errors that the AI is happy to ignore? Make it fail early if setup fails. Does it really exercise the app? Does it wait forever for something to time out? Suggest criteria for noticing problems and defining success. Never trust a test you haven’t seen fail!
This test is scaffolding. It is the code that helps you write the production code. Here’s a great thing about scaffolding: you don’t need high quality. Throw it away after this task is done. Vibe coding can work here. Even if I only half-trust this test, it can double the speed of our work for the day.
Not all tests need to run in CI all the time. Browser-based tests are often flaky. It’s OK to write a test script for today’s work, and then delete it. Tests in CI should be fast, reliable, and specific. That’s a different need than iterating on this new feature.
To answer “what else did I break?” now and in the future, we need automated tests that run in CI forever. With AI help, it is more feasible than ever to maintain a good test suite. Ask it what tests are missing! It will be happy to tell you, and then write entirely too many of them. Prune those down. Be critical! This is how you keep your feature working under future change.
The AI can write a zillion unit tests, but they are often useless.
Check for:
- Unit tests without assertions
- Overspecific assertions on irrelevant fields
- Useless assertions that don’t mean it succeeded.
- Lots of mocking, a sign of tests too thin to be useful
- Outputs that match sneaky fallbacks in the implementation
- When it fails, the failure is not swallowed
The gotchas in AI-generated code are different from ones humans tend to write. Spotting its sycophancies is one of several skills that developers acquire while wrangling AI agents, but that’s a different article.
Conclusion
An AI-compatible codebase is one where it is easy to answer, “What should this look like?” and “Does everything still work?” Organization, consistent style, clear types and tests make a codebase easier to change—both for AIs and humans.
Most legacy codebases don’t have thorough tests, consistent style, or clear documentation. People struggle to change this code without breaking anything, and AI will struggle faster and break things harder, leaving people to clean up the results.
The path to productivity with AI agents starts with: use them to add the affordances that they need in order to change code safely. The work of adding tests, documenting architecture and code direction, and tightening lint rules and typing is approachable with AI assistance. We can use AI to make our codebase amenable to AI.
As a bonus, the codebase becomes more humane.
How to get started:
- Begin a RULES.md file. Tell it where some good example code lives
- Whenever the agent does something wrong, add guidelines to RULES.md
- Before changing code, ask: “How can I know this code works?”
- Then ask, “How can I automate that?” Get the AI to help you automate that
- Make it run everything you would normally run, and tell it to add these instructions to RULES.md for you
- Have it implement tests to make sure your feature continues to work under future change