Happy Birthday to Us: Honeycomb 10 Year Manifesto, Part 1
When we left Facebook, we were determined to build a tool that would help solve the hardest problems in software. Not by copying everyone else’s architectural hand-me-downs and making incremental improvements, but by borrowing from powerful BI and product tools, and reasoning from first principles about what software engineers need to understand their code in production.

By: Charity Majors

The Next Era of Observability: Founders’ Reflections
Join Honeycomb Co-founders, Christine Yen and Charity Majors, as they revisit their most controversial takes from 2016.
Watch Now
Christine and I started Honeycomb in 2016, which means it’s been ten years. Christine, a developer, and I, an operations engineer, were both profoundly unhappy with the state of the art in monitoring and logging tools.
The tools we had used at Facebook didn’t spray our signals around to a bunch of siloed-off pillars. They consolidated as much context as possible so we could properly explore it, the way every other non-software engineering team already takes for granted.
It was life-changing. Hard, novel, complex systems problems that had taken days to understand became trivial. Patterns that had taken deep intuition and historical knowledge were suddenly easy enough for an intern to find. Understanding user behavior stopped feeling like an engineering problem and started feeling like a support problem—one that our sales team could usually handle.
When we left Facebook, we were determined to build a tool that would help solve the hardest problems in software. Not by copying everyone else’s architectural hand-me-downs and making incremental improvements, but by borrowing from powerful BI and product tools, and reasoning from first principles about what software engineers need to understand their code in production.
We wanted to change other engineers’ lives and open their eyes to how much better things can be, the way ours had been opened, the way our lives had been changed.
Read our O’Reilly book, Observability Engineering
Get your free copy and learn the foundations of observability,
right from the experts.
The world changed. The principles didn't.
Ten years ago, the hardest problems in tech were about soaring complexity. At the time, this meant cloud, microservices, containerization, and polyglot persistence. It was clear at the time that existing operational tools were straining at the seams to keep up with accelerating rates of complexity and change.
The hardest problems in tech are no longer about containers or the cloud, but they’re still about complexity, in a way, and they are definitely about the accelerating pace of change. AI has pushed the cost of generating code down close to zero, which means that the bottleneck has moved from building to learning and validating. If 90% of the code in your organization will be written by AI by the end of this year—a prospect that once sounded outlandish, and is now unremarkable—what else needs to change?
AI is an accelerant
It is notable, we think, that what teams seem to be struggling with the most is not the genuinely novel parts of AI (the hallucinations, the nondeterminism) but the software parts—the parts that should be boring. It’s just software… only 100x as much of it, 100x as fast as before.
Our take is that most production systems have evolved to tolerate a certain rate of change, held together by a variety of duct tape and stupid human tricks: intuition, instinct, manual patches, and (most of all) a few people who have been there for 10+ years and hold the missing context in their heads. As the rate of change is ramping up, all this duct tape is flying off the bus. If the knowledge is not encoded into the system, it does not scale.
One of the biggest gaps being duct taped over is when your most valuable telemetry is scattered across multiple siloed sources, and humans have to grope around and guess to find it. What we’ve been saying for ten years is true, now more than ever: people trying to patch their leaky boats with multiple pillars are paying more money for worse results.
What we said then
Ten years ago, we said things that seemed radical:
- Complex systems fail in ways you can't predict. Your tools need to handle questions you haven't thought to ask yet.
- You can derive metrics, logs, and traces from structured log events. You can't go the other way. Events are the source of truth.
- If you can't slice by any dimension, you're guessing.
These were controversial. Now they're obvious. The irony is that AI makes them more urgent, not less. The same characteristics that humans need to understand their software from first principles—high cardinality, high dimensionality, flexibility, speed, semantic conventions, lots and lots of context—are exactly what AI needs to do great work.
AI is not magic. Shallow data sets typically yield very little value (this was the flaw with AIOps). AI demands data in depth, with relationships intact, and as much context encoded into the system as possible.
What's true now
1. Unknown-unknowns are no longer the exception, but the design
Software systems may never have been as deterministic as we thought they were. But on a micro level, they were deterministic by default. When lines of code behaved unpredictably, it was a bug—something to fix, eliminate, prevent.
AI systems behave unpredictably by design. The same input produces different outputs. The model hallucinates. The agent takes a different path. This isn't a failure mode. It's the feature.
Unknown-unknowns aren't edge cases anymore, they're the normal state of production. And if your tools assume you know what to look for, they've already failed.
2. Context can't be reassembled after the fact
When something goes wrong in an AI system, the question is never simple. Was it the prompt? The model version? The retrieval step? The user's input? The infrastructure? Some interaction between all of them?
You need all of that context together, not correlated across three tools after the fact, not reconstructed from fragments. Together, at query time, explorable. Observability tools that break telemetry apart into different pillars destroy the relationships you need to understand what happened. You can't put Humpty Dumpty back together.
3. Most people don't know what question to ask
Here's the uncomfortable truth about observability: it's always assumed expertise. Build dashboards for the metrics you understand. Write queries for the patterns you recognize. Set alerts for the thresholds you've defined.
But most people don't know what to ask. They know something's wrong, but they don't know where to start. This has always been true, but AI makes it the whole game. Systems are more complex, more opaque, and they’re changing faster. The people running them have less time and less context than ever.
Observability that requires expertise to use is observability for the 1%.
4. Learning requires speed
Engineers learn about production the way people learn about plumbing: only when there's a problem that needs fixing.
The threshold for getting paged is (and should be) really high, which means most of what's happening flies below the radar (performance shifts, cost creep, edge case behaviors, and the side effects of yesterday's deploy). By the time you notice, the distance from cause to effect has grown too large. The person who wrote the code has moved on. The context is gone. Your ability to learn has degraded.
Fast feedback loops that learn from every change are the only way to learn at scale, but they require:
- Tools worth looking at—fast, precise, specific. You can’t validate your code with dashboards or aggregates
- Instrumentation that decreases development time rather than compounding it and adding friction
AI code generation makes this urgent. When you're generating code 100x faster, the bottleneck shifts from writing code to validating intent. Can you tell if it's working? Can you learn from it before shipping the next thing? If not, every deploy is an open loop that creates more and more technical debt. A snowball effect.
5. The unit of work has changed
For twenty years, the atomic unit of software was the request. A transaction. Something starts, something ends, you trace what happened in between. That model is breaking.
Today's work is:
- A conversation that spans minutes or hours
- An agent workflow that branches, retries, loops humans in
- A pipeline that runs in the background, touches twelve services, and might not "complete" for days
The trace model assumed work had clear boundaries. Increasingly, it doesn't. And as such, observability needs to work across time, not just across services.
6. Tokens are the new compute
Performance used to mean CPU, memory, latency, throughput—infrastructure you provision and amortize. AI flipped the economics. Every inference costs money. Not amortized—direct, variable, per-request. The more you use it, the more you spend.
This changes everything:
- Efficiency switches from server utilization to token utilization
- Cost is a design problem, not just capacity planning
- You can't optimize what you can't see. If tokens aren't a first-class dimension, you're flying blind
Cost has always been an attribute of architecture and usage, but somehow, in the cloud years, we allowed ourselves to drift away from a model where cost was tightly integrated and attributable. We need it back.
7. The interface has to meet people where they are
AI tools like Cursor figured something out: chat is 10% of the interface. The other 90% is the domain-specific representation of the work—the diffs, the files, the state of the thing you're building.
The conversation is how you steer. The work surface is what you're actually looking at.
Most observability tools got this backwards. They built dashboards and added chatbots. But the future isn't "dashboards with AI." It's AI-native interfaces where visualization serves understanding, not the other way around.
And the interface isn't static. It's an autonomy dial:
- When you want an autopilot, the system handles it before you notice
- When you want a copilot, it’ll give you guidance while you drive
- When you want an analyst, it’ll gather evidence while you direct the investigation
The question isn't "human or AI." It's "how much autonomy, for which task, right now."
8. Humans aren't leaving the loop
There's a fantasy that AI will replace operators. Deploy the system, let it run itself, go home.
That's not how complex systems work. And AI makes systems more complex, not less. Turns out, operational excellence is both the key to deriving sustainable value from AI-generated code, and a durable differentiator against the competition.
What changes is the nature of the work:
- Less time querying, more time deciding
- Less time finding the problem, more time understanding what to do about it
- Less toil, more judgment
Humans need to be in the loop, making decisions, and on the loop, steering the system's behavior over time, declaring intent, testing intent, validating intent.
The goal isn't to remove humans. It's to make human attention count.
9. Development loops and operational loops need different tools
The operational feedback loop kicks off whenever someone gets paged or a complaint gets escalated: alert -> debug -> fix. The operational mandate is to protect the health of the service in aggregate. The “three pillars” model was built for this use case.
The development feedback loop for most developers looks like build -> test -> merge. It does not include production, because historically, if you were using ops hand-me-down tools (the three pillars), the instrumentation phase was prohibitively complex and difficult, and after it deployed, the analysis phase was virtually useless. You can’t tell precisely what happened if all you have are dashboards and aggregates.
But the development feedback loop should include production, because production is the only thing that matters. You don’t create any value until your code is in production. You don’t learn anything real until your code is there, either.
Ops needs a mallet, but devs need a scalpel. Ops cares about the health of the service, but devs care about the experience of each user as they use the product.
AI makes it possible for a developer to declare intent, autogenerate code, test for intent, deploy, and precisely validate that intent in production—all without leaving their development environment. This is exciting. But it doesn’t work with just any old data. You need precision tooling on top of telemetry data that preserves relationships.
The future is starting to come in to focus
My friend Cory Ondrejka just wrote a manifesto of his own, The o16g Manifesto. In it, he says: “Outcome Engineering is how we move our profession beyond software engineering. To transform agentic development into something vastly more capable, faster, and trustworthy than either vibe or hand coding.”
It was never about the code. Engineering has always been about solving problems for the business by using technology, and it still is. Engineering has always been a team sport, and it will continue to be a team sport—except now your team has both agents and super-agents and other people collaborating together on it.
A smaller number of people can cover dramatically more surface area, which means their jobs are more fun and vital and interesting than ever before. Roles are blurring all over the place between eng and manager, eng and product, eng and business, eng and design, frontend eng and backend eng. This is exciting.
If you hail from ops, like me, this is an especially giddy time to be alive. Remember how everything was “devs devs devs” for the past decade, like ops was obsolete and embarrassing? Operational excellence is a competitive differentiator in its own right, as well as the guardrails and practices that AI-native sociotechnical systems depend on in order to unlock exponential value.
But AI is not magic
When you're designing these systems—and you are designing them, whether you realize it or not—remember what actually works: context is everything.
The relationships between your signals matter, possibly even more than the signals themselves. If you're spraying signals across multiple pillars, AI can’t fix that. AI can’t fix shallow data or ontological confusion, because it is not, in fact, magic. It will amplify the dysfunction instead.
Speed and flexibility matter more than ever. Fast feedback loops that let you learn from every change, agents that will follow each change into battle over days and weeks to come, to find bugs before customers do. High cardinality and high dimensionality so you can slice by any dimension. Sub-second query times so you can actually explore instead of just staring at pre-baked dashboards.
The same principles that made Honeycomb-style observability powerful for humans—rich context, preserved relationships, speed and flexibility—are what make it powerful for AI agents trying to understand and operate your systems at 100x speed.
Get your foundations right. The rest will follow. The future belongs to whoever figures this out. We believe that’s us.