Observability: The 5-Year Retrospective

lighthouse on a cliff

16 Min. Read

Two years ago, I wrote a long retrospective of observability for its third anniversary. It includes a history of instrumentation and telemetry, a detailed explanation of the technical spec, and why the whole “three pillars” thing is nonsense. At the time, it’s what was needed to steer conversations away from silly rabbit holes about data types and back to what matters: how we understand our systems.

Two years later, observability rhetoric has become even more overheated. Every other day, I hear about a new “observability for x” startup getting funded. Which is cool, in a way, but I worry about the devops’ification of the term. “Observability” is starting to be used to mean anything to anyone, just a label you slap on the side of your product to fit in with the zeitgeist.

O’Reilly said it best this year when they summarized learning trends in 2021:

Observability saw the greatest growth in the past year (128%), while monitoring is only up 9%. While observability is a richer, more powerful capability than monitoring — observability is the ability to find the information you need to analyze or debug software, while monitoring requires predicting in advance what data will be useful — we suspect that this shift is largely cosmetic. “Observability” risks becoming the new name for monitoring. And that’s unfortunateIf you think observability is merely a more fashionable term for monitoring, you’re missing its value(Emphasis mine.)

We can’t lose sight of that value. We can’t afford to. This isn’t just a tale of vendors arguing to define marketing terms for their own benefit. The pain and suffering that people endure every day because they can’t understand their own damn systems is too real. The long hours, the toil, the greasy hacks moldering away into technical debt, the late nights, the missed sleep, the burnout. The pain is real, and the solutions are specific. We need specific, meaningful technical terms to help users navigate the future and find their way to those solutions.

Christine and I started Honeycomb after trying every monitoring and logging solution under the sun and realizing that they were fine for answering simple known-unknowns, but wretchedly horrible for answering unknown-unknowns. Observability changed our lives (and sleep schedules) dramatically, transformed how we used systems, made us better engineers, and made us want to give that to our entire industry.

Now it’s time for another retrospective. What have we learned and where are we, as an industry, going wrong?

From the mouths of Tweeters: Observability vs. Monitoring

I opened Twitter the other day to find Corey Quinn dangling some tweet bait in my face, with this quoted retweet simply saying, “Discuss!”:

Annnd boy, did we. Cue a flood of definitions for observability (and squabbling over each other’s definitions).

First, I stated that observability was defined by structured events.

Then, Alex Hidalgo countered that it wasn’t.

The perspective that Alex appears to be echoing is one where observability is just a generic synonym for telemetry or insights. Any old graph, dashboard or measure you happen to have lying around? That’s observability! 🙄 That janky old nagios3 instance you still haven’t spun down? Observability! 🙄🙄🙄

In that scenario, observability is just another generic word for “understanding systems.” A closely related view is that observability has three pillars: metrics, logs and traces. If you also accept that definition, then I can understand why you might gripe that using observability to define a technical set of capabilities that go well beyond monitoring would be erasing “decades of work.”

don’t accept the three-pillars definition. The world doesn’t need another classification word for existing data types. And it doesn’t need another synonym for “telemetry” or “insights.”

But there is a coherent set of practices that started to emerge about five years ago, in response to changes in how we build modern systems — particularly microservices. Those practices are a fairly discontinuous leap from their heritage of using metrics and logs. In fact, some of the best practices are exactly opposite those of monitoring, and the respective data formats used are entirely incompatible. Those practices do need a word.

It’s been a while since I waded into the definition wars. And I told myself I wouldn’t do this anymore. But it seems we can’t stop getting lost in the weeds, and we won’t get things back on a useful track unless we do.

Let’s do this.

A historical taxonomy of definitions for observability

1960-2015: To mechanical engineers and control systems theorists, observability means the mathematical dual of controllability — how well can you understand the inner workings of a system simply by observing it from the outside? The term was rarely heard in the computer industry, with the notable exception of Twitter’s “observability engineering” team. However, they used it as a generic synonym for telemetry.

2015: We started building Honeycomb and wrestling mightily with how to talk about what we were building. In July of that year, I googled the definition of observability, and it resonated with me powerfully. It was exactly what we were trying to build — the ability to understand any inner system state, with no prior knowledge or warning, simply by interrogating it from the outside! At the time, I wrote a great deal of content about the technical prerequisites and implications of this definition.

2017: After attending a tracing summit, Peter Bourgon wrote an extremely influential blog post claiming that observability has three pillars: metrics, logs and traces.

2018: Vendors from logging, metrics, and tracing companies (unsurprisingly) loved this definition and adopted it enthusiastically. “Three pillars” marketing content pummeled you with targeted ads. I wrote my retrospective. Ben Sigelman wrote an excellent rebuttal too. One large incumbent monitoring vendor launched their “observability” platform and the gold rush began. Vendors in the metrics, logging, tracing, monitoring, and APM categories — even certain database companies — all began rebranding themselves as observability companies. Companies were acquired, products were consolidated on the backend, and the marketing dollars just flowed.

2020: I personally resolved to stop talking as much about the technical definitions of observability and instead focus on the capabilities. What matters is how the technology changes our practices as an industry, right?

2021: The present. Where a lot of people are yammering on about observability with very little conceptual coherence.

The ensuing Twitter thread above shone a light on some of the many conflicting and/or complementary definitions for observability — some of which I can totally accept and buy into. I think others are (at best) useless or (at worst) harmful to users.

Let’s examine some of the definitions for observability offered in this very thread. Cue my inner pedant!

(a since deleted tweet): Anything that gives you insight into a system counts as observability.

No. This bar is way too low. Anything that offers insights? You mean my copy of “Advanced Programming in the Unix Environment” counts as observability? Try again.

Lorin Hochstein: Observability is about fault localization.

True! Observability is not about debugging your code. It’s about figuring out where exactly in the system to look for the code you need to debug. This succinct take is correct, but it could use more detail.

@austinlparker: Monitoring is the monitor telling me the baby is crying but observability is telling me why.

Great (if weird) analogy! It nods at the perspective shift where monitoring is primarily running third-party checks, while observability is a first-person narrator.

Sam Coren: Observability is building out systems and processes to ensure accurate and actionable data exists to measure performance and diagnose issues. It’s about training people to know how to instrument their apps or how to create good workflows that helps them solve actual problems.

Sure, I guess. But definitions like this are also harmful. In practice, how is this any different than monitoring?

Doug Odegaard: Observability is a way of correlating to reduce the amount of time for human correlation (poring over by hand or grepping logs). Find one issue and dive down on the other two. It’s a data solution and UX as well.

Ooh, sounds like Doug has spent time with a real observability tool! This is indeed one of the most common motions of observability in practice.

@Network_Guy: Observability allows me to understand what my application is doing at a much deeper level then single entity monitoring.

True. But … how? Why?

Kevin Parsons: Monitoring is associated with boring old things like Nagios. Observability is much shinier and cooler. They are both buzzwords, I don’t see any meaningful difference between them.

This is a terrific example of why I needed to write this article. Kevin, I hope you’re reading!

And then Liz Fong-Jones adds to the mix.

Liz Fong-Jones: Observability is not a binary state; it requires work over time. It’s the ability to swiftly surface relevant data to solve perf regressions, understand how your system works or your code behaves, and resolve incidents. Observability uses instrumentation to help provide context and and insights to aid monitoring. While monitoring can help you discover there is an issue, observability can help you discover why.

Honors class answers, Liz!

(a since deleted tweet): Observability is not about any particular tool or data structure, it is about gaining insight into systems.

It’s this last one I particularly want to speak to. Because it’s true. But it’s also ridiculously vague. Observability is about successfully gaining insight into systems. And it’s the how of what we do to get there — that’s what matters about observability.

Observability as defined by what you can actually do with it

Reading the various takes above was actually encouraging to me, because a bare majority of responders have internalized two important things:

  1. Observability is materially different from our monitoring heritage, and
  2. It is different because it helps you explain and understand the inner workings of the system.

There are dozens, maybe hundreds, of different ways to create nuance around that to arrive at some definition that works for you. Similarly, DevOps (to this day!) struggles with the fact that there is no real agreed-upon definition of the term. Therefore, one team’s practice is DevOps just as much as another team’s practice (that looks nothing like the first team’s) is still DevOps — much to the bewilderment and confusion of anyone on the outside looking in. Vendors can then easily slap a DevOps label on just about anything and no one is apparently the wiser. After all, it’s still DevOps!

Rather than describing all the different facets and implementations, which can be either unhelpfully vague or more harmful than helpful, how about we focus on the capabilities instead? What you can materially do with observability that you can’t do with another practice like monitoring?

Here’s my definition of observability, for what it’s worth:

Observability lets you find answers to application issues that are unknown-unknowns. You have observability if you can ask any question of your system from the outside, to understand any state the system has gotten into, no matter how bizarre or novel, without shipping any new custom code to get answers.

I like this definition because it hews closely to the definition from mechanical engineering and control theory:

Observability is the ability to measure the internal states of a system by examining its outputs. A system is considered “observable” if the current state can be estimated by only using information from outputs.

Imagine: You just deployed, and something weird is happening. The anomaly doesn’t look like any issue you’ve ever encountered before. Can you swiftly figure out what’s going on? Can you do that without SSH’ing into the machine? Can you do that without adding any new custom metrics? Can you do that without spraying a bunch of Hail Mary log lines around the new change and redeploying?

If the tools you’re using to understand that system are metrics, static dashboard and log aggregators, then probably not. Those tools are built to measure your system against known good or bad states and tell you if good or bad things are happening in production right now.

They’re not built to debug entirely new issues you didn’t even know were possible by methodically following clues, by slicing and dicing telemetry data to compare it across multiple dimensions, and by digging through what’s happening in real time to identify outliers and find correlations.

What I like about focusing on the capabilities that define observability is that, if you accept the definition about finding answers to unknown-unknowns, then you inevitably must also accept that there are certain technical prerequisites for achieving that goal. In order to understand any unknown internal state, any observability tool must also support:

  1. Arbitrarily wide structured raw events: a building block so granular that you’re able to slice and dice your telemetry data across any dimensions needed to find the tiniest commonalities across system anomalies.
  2. Context persisted and ordered through the execution path: you need to instrument your code so that you are persisting the unique request id, trace id, and any important context from service to service, so that you can trace the requests and so that later, while slicing and dicing, you will be able to correlate patterns between requests.
  3. Without indexes or schemas: because you can’t predefine optimizations for things you didn’t know you were going to look for in the future — data retrieval must be fast, at all times, for all data. (You pretty much have to have a columnar store to achieve observability.)
  4. High-cardinality and high-dimensionality: so that there are effectively no limits to how you’re able to slice and dice your data.  You must be able to work with your data without the constraints imposed by most traditional databases.
  5. Client-side dynamic sampling: so that for the largest use cases at scale, where the typical trickle of telemetry data can quickly become a torrential flood, you can manage the tradeoff of keeping enough observability data for proper debugging without killing your underlying systems.
  6. An exploratory visual interface that lets you arbitrarily slice and dice and combine dimensions: humans are visual creatures and we’re wired to find patterns amongst a sea of noise. Visual interfaces help us quickly sift out signals that matter when exploring unknown issues.
  7. In close to real time: you have to get answers within seconds, because no one has time to wait several minutes for results when production services are down.

I’ll be the first to admit: Maybe I’m wrong! Maybe you CAN get answers to unknown-unknown questions without high cardinality. Maybe you DON’T have to use the arbitrarily wide structured data blob in order to correlate details from context between a set of events. Maybe YOU know how that’s done.

Great! I would LOVE to hear about it! Come find me on Twitter.

Wouldn’t THAT be a more helpful argument to have? How can we achieve these kinds of results? How can we build more on top of these new ways to understand our systems?

Observability must be a clear concept

Lots of people seem to have gotten the memo that observability is about understanding your systems. Great! Now they just need to understand enough about the required capabilities so that they can see through people selling them snake oil with an “observability” label on the bottle and no material way to actually achieve it.

(I’m also looking at you, “observability engineering” teams that spun up Prometheus, added some logs and dashboards, and called it a day. I hear from your software engineers every week in despair. Just because you call it observability doesn’t make it so.)

When talking about observability as a set of capabilities — understanding any new, bizarre, and previously unknown failure simply by analyzing the data that the system emits — you can start to see why observability has exploded in popularity.

Systems used to fail in fairly predictable ways. It didn’t matter that metrics stripped away all the context before storing them or that you couldn’t dive down into a dashboard and ask “…and then what? And then what?” You didn’t need to dig that deeply to understand where failures could be occurring. You could just examine problems by monitoring conditions from the outside.

You had The App, The Web, and The Database. The overwhelming majority of complexity lived in The App. If all else failed, you would attach gdb to The App and step through it. Back then, we knew where to look for issues.

But microservices pushed much of the complexity out of the app and into the system space. Modern systems are now a soup of databases and storage types, serverless functions and third-party APIs. Novel and bizarre partial failures aren’t fascinating rarities in today’s application architectures. You have unknown-unknowns for breakfast.

And in today’s world, we need to change how we think about systematically finding the real sources of issues. For decades, we told ourselves that’s what we were doing. But it turns out that most of our “debugging” was done by intuition, or by reasoning through the maybe half dozen components where a failure could be lurking.

Now? Forget attaching gdb to The App. You don’t even know which of the dozens of ephemeral app instances in this cluster could even be experiencing the problem. Or if it’s one of the hundreds or thousands of instances upstream or downstream from this one.

That’s why “observability” matters. We need to have a term that encapsulates all of these newer and firmly modern tools and best practices, so that people can find the tools they actually need. We don’t need yet another synonym for telemetry, or monitoring, or for any of the tools we’ve had for decades that are failing the problems of today.

Don’t forget to share!
Charity Majors

Charity Majors

CTO

Charity is an ops engineer and accidental startup founder at honeycomb.io. Before this she worked at Parse, Facebook, and Linden Lab on infrastructure and developer tools, and always seemed to wind up running the databases. She is the co-author of O’Reilly’s Database Reliability Engineering, and loves free speech, free software, and single malt scotch.

Related posts