Observability: What’s in a Name?

By: Charity Majors | August 22nd, 2017

4 Min. Read

“Is observability just monitoring with another name?”

“Observability: we changed the word because developers don’t like monitoring.”

There’s been a lot of hilarious snark about this lately. Which is great, who doesn’t love A+ snark? Figured I’d take the time to answer, at least once.

Yes, in practice, the tools and practices for monitoring vs observability will overlap a whole lot … for now. But philosophically there are some subtle distinctions, and these are only going to grow over time.*

“Monitoring”, to anyone who’s been in the game a while, carries certain connotations that observability repudiates. It suggests that you first build a system, then “monitor” it for known problems. You write Nagios checks to verify that a bunch of things are within known good-ish thresholds. You build dashboards with Graphite or Ganglia to group sets of useful graphs. All of these are terrific tools for understanding the known-unknowns about your system.

But what happens when you’re experiencing a serious problem .. but you didn’t know for hours, until it trickled up to you from user reports? What happens when users are complaining, but your dashboards are all green? What happens when something new happens and you don’t know where to start looking? In other words, how do you deal with unknown-unknowns?

Known-unknowns are (relatively) easy (or at least the paths are well-trodden). Unknown-unknowns are hard.

But here’s the thing: in distributed systems, or in any mature, complex application of scale built by good engineers … the majority of your questions trend towards the unknown-unknown.

Debugging distributed systems looks like a long, skinny tail of almost-impossible things rarely happening. You can’t predict them all; you shouldn’t even try. You should focus your energy on instrumentation, resilience to failure, and making it fast and safe to deploy and roll back (via automated canaries, gradual rollouts, feature flags, etc).

The same goes for large apps that have been in production a while. No good engineering team should be getting a sustained barrage of pages for problems they can immediately identify. If you know how to fix something, you should fix it so it doesn’t page you. Fix the bug, auto-remediate the problem, or hell–just disable paging alerts in off-hours and make the system resilient enough to wait ‘til morning. (Please!)

In the end, the result is the same: engineering teams should mostly get paged only about novel and undiagnosed issues. Which means debugging unknown-unknowns is more and more critical.

You can’t predict what information you’re going to need to know to answer a question you also couldn’t predict. So you should gather absolutely as much context as possible, all the time. Any API request that enters your system can legitimately generate 50-100 events over its lifetime, so you’ll need to sample heavily. (See our sampling docs for more best practices.)

“Observability” is a term that comes from control theory. From Wikipedia:

“In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. The observability and controllability of a system are mathematical duals.”

In ordinary English, what this means is that you have the instrumentation you need to understand what’s happening in your software. Observability focuses on the development of the application, and the rich instrumentation you need, not to poll and monitor it for thresholds or defined health checks, but to ask any arbitrary question about how the software works.

An observable system is one you can fully interrogate. Given a pile of millions of needles, one or two of which have problems, can you slice and dice and sort finely enough to quickly locate literally any given needle?

Monitoring is great. We’re big fans. But it’s not what we’re trying to build here.

(Historical side note: we first adopted the term because companies like Netflix, Twitter, etc tend to use “observability” internally. Lots of our users sign up for Honeycomb because they desperately miss the kind of tooling they used to have at their $bigco job, so the association was useful.)

* Could you say that observability is a subset of monitoring? Sure, you could! But what term would you use for older-style thresholds-and-canned-dashboards? I’m stumped on that point, so I’ve been calling it “monitoring”. If you have a better term, please share!)

Don’t forget to share!

Charity Majors

CTO

Charity Majors is the co-founder and CTO of honeycomb.io. She pioneered the concept of modern Observability, drawing on her years of experience building and managing massive distributed systems at Parse (acquired by Facebook), Facebook, and Linden Lab building Second Life. She is the co-author of Observability Engineering and Database Reliability Engineering (O’Reilly). She loves free speech, free software and single malt scotch.

Rox Williams | Apr 30, 2025

The Guide to Kubernetes Debugging

Kubernetes is widely used for deploying, scaling, and managing systems and applications and is an industry standard for container orchestration. Google engineers originally developed Kubernetes as an open-source project. Its first release was in September 2014, and since then, it has matured into a graduate project maintained by the Cloud Native Computing Foundation (CNCF). With the complexities of scale and distributed systems, debugging in Kubernetes environments can be difficult.

Debugging

Brian Chang | Feb 03, 2025

Booking.com’s Journey to Enhanced Observability

Since its early startup beginnings in Amsterdam, Booking.com has redefined the travel industry, establishing itself as a premier platform for millions of travelers worldwide. With over 28 million accommodation listings and a staggering 1.5 million room nights booked every day, Booking.com operates on a scale that demands a robust and constantly monitored infrastructure.

Customer Stories Debugging Events

Brian Chang | Jan 13, 2025

Faster Fixes, Happier Customers: Gearset Leverages Honeycomb for Success

Gearset knew it was time to level up their observability strategy. To deliver the reliability and responsiveness their customers expected, they needed a more powerful way to pinpoint and solve issues—one that could keep pace with their rapid growth. That’s when they turned to Honeycomb.

Customer Stories Debugging

All-in-one Observability

Why Honeycomb

Looking for something?

Our mission

Observability: What’s in a Name?

Charity Majors

Related posts

The Guide to Kubernetes Debugging

Booking.com’s Journey to Enhanced Observability

Faster Fixes, Happier Customers: Gearset Leverages Honeycomb for Success

Ready to get started?