Will AI Speed Development in Your Legacy App?

LLMs

By: Jessica Kerr | July 21st, 2025

LLMs

10 Min. Read

Some people can get an AI assistant to write a day’s worth of useful code in ten minutes. Others among us can only watch it crank out hundreds of lines of crap that never works. What’s the difference?

There are some skills specific to AI development. There are also properties of the codebase we’re working in that make it amenable to AI assistance. Most AI demos use projects created from scratch with AI in mind—cute. Most valuable software development happens in legacy projects that have existed far longer than ChatGPT or Claude. Even developers with AI coding experience can be slower with AI than without. How can we get value from AI assistants in real-world development?

This article lists four codebase qualities that lead to AI-assisted productivity, along with one surprise bonus.

Qualities of an AI-compatible codebase

Quote: Do you have a nice loosely coupled modular codebase supported by lots of tests?
My current running hypothesis is if so, coding assistants add value, otherwise they’re not much use beyond simple tasks.
– Rob Bowley

AI needs to know what good looks like, and then it needs ways to check its work and fix it.

Code is organized and consistent

We want the AI to write code that looks like our best code.

It needs to know: what part of the codebase is exemplary for this new code? If all the code is consistent in style, library versions, testing, etc. then the AI is good at following these patterns. That’s unrealistic in a large codebase, as architectural and style direction changes as people learn and as people leave. More realistic: clear documentation of where everything belongs and what we want it to look like.

When a codebase contains both modern and historical code patterns, whoever implements new code needs to know which one to emulate. For example, at Honeycomb, we implemented a Schema Store service that caches the generated schemas for customer datasets. Before that, code everywhere went straight to the database. For months and years afterward, some code still went straight to the database. The people who wrote Schema Store socialized the new direction to all the other teams, and tenured developers still pass this direction on to new ones.

The AI agent doesn’t have this background. Documentation can tell an AI which code embodies the current practices, so it knows what pattern to follow. You can add a comment to the old code: mark it deprecated and point to an implementation that is up-to-date.

Make a style guide for AI, in a rules.md or similar. Tell it which patterns are current. Point it to the latest examples, and tell it what to avoid. Refer to files or include a good code example right there.

## Schema Access

When retrieving schema information, avoid using types/schema.go, which is deprecated. Instead, use the Schema Store, as in poodle/query_editor/autocomplete.go, function `fetch_fields`

When this gets out of date, have the agent update it for you—it’s fast!

Consistency becomes more important and easier to get: AI can help us make the codebase consistent. At Honeycomb, we didn’t spend months refactoring every single place to use the Schema Store before moving on to the next urgent work. If we had AI assistance at the time, those months could be days, and days could be hours! We can now maintain a level of consistency we couldn’t before. And when we do, both AI and human coders benefit. For example, when the Schema Store is used everywhere, we can safely change the database structure. New options are open to us.

The whole codebase is part of the context that the AI can see, which makes it part of the instructions. Clear instructions help it get closer to what you want, but even more important is: how can it know when it’s wrong? How can the AI know its work is acceptable? It can’t (that’s my job), but it has some ways to know when the work is not sufficient.

Use static type checking

Strong type checks help AI work better. The more problems a compiler finds, the more the AI can fix before it reports back.

In a dynamic language, add type hints and definitions. This gives it a better chance of getting code right the first time, because there is more information, which will help compilers find mistakes. Bonus: this also helps humans who didn’t write or don’t remember this code.

Static typing is more important now, and less work. I loved dynamic languages because I could add data in one place and use it another without changing code everywhere in between. Now, AI can make updates everywhere in between, and get it mostly right, then the compiler errors will nudge it the rest of the way there. It won’t even get frustrated by them like I do. Explicit structure helps the AI, and it helps new members of the team get that context. People like clues too!

Remind the AI to run all the checks after it writes the code. Remind it in multiple places.

Put all the development steps in README.md
Tell it to run each step after every change in RULES.md (or whatever model-specific file)
You’ll still have to prompt it to do it half the time
And then remind it again when it skips one
Check rigorously in automated CI steps before PR review; coworkers don’t need to scour changes for bugs that can be caught in static analysis.
Extreme measure: add a commit hook so that Git will error if it skips a step.

Automate style checks

Don’t waste your time checking style if a linter can do it. It’s faster, it’s deterministic, and it’s easier on relationships on the occasion that the code was written by a human.

Tightening linting rules requires changing a bunch of code that doesn’t currently conform. AI can help with that… or will, as it gets better. This also increases the consistency of the codebase. This was harder when we had to do this repetitive work by hand.

Remind the AI to run the linter in autocorrect mode. Don’t waste your time or your tokens on mechanical fixes.

Of course, even trivial changes are only fast when it’s fast to verify that everything still works.

Get serious about testing

The other day, while coding with an agent, I found myself deploying to kubernetes, clicking on a website to trigger an error, and looking in Honeycomb for the error message over and over, each time Claude Code told me it was fixed. If I’m manually operating the feedback loop while the AI does the fun part, something’s wrong!

My job is to design the feedback loop. AI can implement it with me. Then AI can run it over and over.

The two big questions about every code change: does it work? And, what else did I break?

When coding by hand, I can live with manually testing my changes a few times an hour. I might rebuild the app, restart a service, and click around in a browser.

The AI agent needs to do these checks a few times per minute. They have to be automated. The AI can’t go faster than its feedback loop of “does it work?” Don’t put yourself in that loop!

A shell script can do a deploy, click on a website (with Playwright), and look at the updated screen. It can check the database. It can check logs and traces, and give you a link to the trace in your observability platform. You can get the agent to write this script for you!

After it writes the script, run it yourself. Did it print errors that the AI is happy to ignore? Make it fail early if setup fails. Does it really exercise the app? Does it wait forever for something to time out? Suggest criteria for noticing problems and defining success. Never trust a test you haven’t seen fail!

This test is scaffolding. It is the code that helps you write the production code. Here’s a great thing about scaffolding: you don’t need high quality. Throw it away after this task is done. Vibe coding can work here. Even if I only half-trust this test, it can double the speed of our work for the day.

Not all tests need to run in CI all the time. Browser-based tests are often flaky. It’s OK to write a test script for today’s work, and then delete it. Tests in CI should be fast, reliable, and specific. That’s a different need than iterating on this new feature.

To answer “what else did I break?” now and in the future, we need automated tests that run in CI forever. With AI help, it is more feasible than ever to maintain a good test suite. Ask it what tests are missing! It will be happy to tell you, and then write entirely too many of them. Prune those down. Be critical! This is how you keep your feature working under future change.

The AI can write a zillion unit tests, but they are often useless.

Check for:

Unit tests without assertions
Overspecific assertions on irrelevant fields
Useless assertions that don’t mean it succeeded.
Lots of mocking, a sign of tests too thin to be useful
Outputs that match sneaky fallbacks in the implementation
When it fails, the failure is not swallowed

The gotchas in AI-generated code are different from ones humans tend to write. Spotting its sycophancies is one of several skills that developers acquire while wrangling AI agents, but that’s a different article.

Conclusion

An AI-compatible codebase is one where it is easy to answer, “What should this look like?” and “Does everything still work?” Organization, consistent style, clear types and tests make a codebase easier to change—both for AIs and humans.

Most legacy codebases don’t have thorough tests, consistent style, or clear documentation. People struggle to change this code without breaking anything, and AI will struggle faster and break things harder, leaving people to clean up the results.

The path to productivity with AI agents starts with: use them to add the affordances that they need in order to change code safely. The work of adding tests, documenting architecture and code direction, and tightening lint rules and typing is approachable with AI assistance. We can use AI to make our codebase amenable to AI.

As a bonus, the codebase becomes more humane.

How to get started:

Begin a RULES.md file. Tell it where some good example code lives
Whenever the agent does something wrong, add guidelines to RULES.md
Before changing code, ask: “How can I know this code works?”
Then ask, “How can I automate that?” Get the AI to help you automate that
Make it run everything you would normally run, and tell it to add these instructions to RULES.md for you
Have it implement tests to make sure your feature continues to work under future change

Don’t forget to share!

Jessica Kerr

Senior Manager, Developer Relations

Jess is a symmathecist, in the medium of code. She sees development teams as learning systems made of people and running software. If we make that software teach us what’s happening, it’s a better teammate. And if this process makes us into systems thinkers, we can be better persons in the world.

Honeycomb In Your IDE? Yes, With Hosted MCP Now Available in AWS AI Agents Category

Austin Parker | Jul 16, 2025

Honeycomb In Your IDE? Yes, With Hosted MCP Now Available in AWS Marketplace AI Agents and Tools Category

I’m pleased to announce the public beta of Honeycomb Hosted MCP, along with our first wave of one-click integrations for Cursor, Visual Studio Code, and Claude Desktop. We’re also very excited to announce that Hosted MCP is available on AWS AI Agents marketplace and for all Honeycomb plans (including our free plan!) at no charge.

LLMs

Austin Parker | Jul 01, 2025

Can Claude Code Observe Its Own Code?

One of the great things about OpenTelemetry is that it's a standard, and standards tend to proliferate. I was excited to see Claude Code add OpenTelemetry metric and log support in a recent release. What was really interesting—beyond the ability to capture usage data from Claude Code—is that you can also get pretty detailed logs about what you’re doing with Claude Code. For instance, how many tokens are being used for each model, which tools are being called, and the length of your sessions.

LLMs OpenTelemetry

Austin Parker | Jun 16, 2025

Tales From the Trench: Building With LLMs and Honeycomb

AI discourse these days is all over the place. Depending on who you talk to, AI’s are absolute flash-in-the-pan junk, or they’re the best thing since sliced bread. I want to cut through the noise, though, and see for myself what someone can do out here on the bleeding edge. Thus, I’m setting myself a challenge: write a usable—and useful—application with Claude Code, from soup to nuts.

LLMs

All-in-one Observability

Why Honeycomb

Looking for something?

Our mission

Will AI Speed Development in Your Legacy App?

8 Best Practices to Understand and Build Generative AI Applications Effectively

Qualities of an AI-compatible codebase

Code is organized and consistent

Use static type checking

Automate style checks

Get serious about testing

Conclusion

Jessica Kerr

Related posts

Honeycomb In Your IDE? Yes, With Hosted MCP Now Available in AWS Marketplace AI Agents and Tools Category

Can Claude Code Observe Its Own Code?

Tales From the Trench: Building With LLMs and Honeycomb

Ready to get started?