Observability: A ManifestoBy Charity Majors | Last modified on July 14, 2021
Everybody and their freaking grandpa is now claiming to do observability, not stodgy old monitoring. Fine, great. Nice to be trendy I guess.
But are they? What’s the difference?
You all know how I define observability: the power to ask new questions of your system, without having to ship new code or gather new data in order to ask those new questions. Monitoring is about known-unknowns and actionable alerts, observability is about unknown-unknowns and empowering you to ask arbitrary new questions and explore where the cookie crumbs take you. Observability means you can understand how your systems are working on the inside just by asking questions from outside. This is what makes complex systems tractable in the accelerating armageddon of increasing systems complexity.
This is not one of those personality tests, I promise
So, I have a very simple quiz for you. Take it, and you’ll know if you are really doing observability or not.
- Do you pre-aggregate? If yes, you are not doing observability. By pre-aggregating you are forever-destroying your ability to answer any questions you didn’t predict in advance. Observability requires access to raw original events, your observability source of truth.
- Do you handle high-cardinality dimensions? If no, you are not doing observability. Exploring your systems and looking for common characteristics requires support for high-cardinality fields as a first-order group-by entity. It’s theoretically possible there are systems where the ability to group by things like user, request ID, shopping cart ID, source IP etc. is not necessary, but I’ve never seen one.
- Are your logs or event data structured? If no, you might kinda be doing observability but in a shitty and inefficient way. You need to be able to compute, bucket, calculate all kinds of transformations on your data, and you simply can’t do that without data structures and field types. Structure your shit.
- Do you discard all the context of your events? Put another way: do you rely on metrics or time-series databases? Metrics are great for describing the state of the system as a whole, but if you throw away everything that correlates to the context of the event, this robs you of your ability to explore, and trace, and answer questions about actual user experience.
- Do you bake observability right into your code as you’re writing it? The best engineers do a form of “observability-driven-development” — they understand their software as they write it, include instrumentation when they ship it, then check it regularly to make sure it looks as expected. You can’t just tack this on after the fact, “when it’s done”.
Ask yourselves these questions, and then ask your vendors: Can I always get back to the source of truth (the raw event)? Can I glom more data onto the event at will? Do you compress away my precious detail so that I can never ask new, deeper questions? Can I break down or group by values that have tens of millions of possible values? Can I run new calculations and transformations over the raw events forever?
If not, they ain’t doing observability.
(They might, in very constrained cases, be building observable apps, which is linguistically related but different and for certain very limited use cases may actually be “observable” — but they are not doing observability.)
You need observability so you can truly own what you ship
What is the point of observability?
Observability is about getting the right information at the right time into the hands of the people who have the ability and responsibility to do the right thing. Helping them make better technical and business decisions driven by real data, not guesses, or hunches, or shots in the dark. Time is the most precious resource you have — your own time, your engineering team’s time, your company’s time.
Own your code, own your services, own your availability. Don’t build shit you don’t understand. Observability is about enabling and empowering software engineers to own their shit.
Find out what it’s like to own what you ship. Sign up for a free Honeycomb account today.
Incidents happen. What matters is how they’re handled. Most organizations have a strategy in place that starts with log searches—and logs/log searching are great, but...
George Miranda, Liz Fong-Jones, and Charity Majors, held a series of live discussions called the Authors’ Cut to bring core concepts of the book to...