Ask Miss O11y: Do I Need Observability If My Stack Is Boring?

6 Min. Read

Observability came out of microservices and cloud-native, right? If you have a simpler architecture, does o11y matter?” — this question came up during recent office hours

Yeah, sort of. On both counts—yeah, it sorta came out of microservices and cloud native, and yeah sorta, you need it with a simpler architecture (though perhaps not as desperately as you otherwise might).

The need for observability grew forth from microservices and cloud native

Microservices, cloud native, polyglot persistence … dynamic, ephemeral components, elastic provisioning, third-party services, Lambda functions and serverless APIs … all of these trends are driving system complexity up and up, forcing more and more of the application logic into the realm of operational trade-offs.

Which means that more and more people are running into the same cliff—the abrupt, discontinuous realization that yesterday’s monitoring tools and dashboards are not going to help you solve what’s wrong with your system today. That moment of terror is exactly what spurred us to develop the technical definition of observability, with its known-unknowns and unknown-unknowns, and then a new generation of explorable, interactive observability tooling based on arbitrarily wide structured events and traces. Observability tooling is defined by its ability to handle high-cardinality, high-dimensionality data with an explorable interface, and those characteristics are no longer optional when it comes to microservices and cloud-native systems.

Most people don’t hit this cliff until they begin adopting microservices-type patterns, and almost everyone does shortly thereafter, so yeah; there’s a definite connection. But just how tightly coupled are they? Like if you don’t have or need many services, is there any point in adopting observability?

The relationship between o11y and simplicity of architecture

I’m a huge proponent of keeping your architecture as simple as possible. Choose boring technology! If you can solve your problems and ship your product using a simple LAMP* stack, for god’s sake please do so.

There are still quite a lot of teams out there running monolithic apps and/or relatively simple architectures, and good for them. But what does this mean for their choice of instrumentation and telemetry? Should they stick with the sort of metrics-based monitoring tools (Datadog, Prometheus, etc.) that we have lived with for the past 20-odd years, or is there a different argument for adopting observability tools?

The relationship does NOT look like this: simple architecture == should use monitoring, complex architecture should use observability.

It is true that you are less likely to run into a sort of telemetry cliff if you have a monolith and few components. But it’s also true that metrics tools force you to treat your monolith like a black box and give you much less insight into what’s going on. For instance: 

  • There are still lots of reasons you may want to be able to trace your requests, even without microservices. Is your request firing off hundreds of db queries sequentially or is it firing off dozens at once (when you thought it was only issuing a single query?) Where is the time going in your slower requests?
  • Lots of times you want observability tooling not just to solve system outages, but to understand the inner workings of your software and systems. You may want to know which of your users is consuming the most system execution time or saturating a lock, and this can be even harder to figure out in a monolith than with microservices.
  • Let’s say your MySQL db is running 80% hot. You need to figure out what’s consuming those resources so you know whether it’s time for a hardware upgrade, a delicate migration, or maybe you just need to throttle a user or set resource caps on the free tier.
  • Someone writes in to complain that all their export requests are failing, so you go take a look at all the requests to /export by that user, and confirm that they are all returning 500. You then look at all requests to /export by all users and break down by status code, and notice that a subset of users are constantly failing. What do they have in common? They are all users with a feature flag set. (You would be unable to do this sort of correlation using metrics tools.)

It’s never not better to have observability and instrumentation than old-fashioned metrics. It’s like, as a pilot, once you’ve gotten used to flying with IFR (instrument flight rules) and your instruments, the idea of going back to VFR (visual flight rules) only is like flying in the dark.

But do you need it or not? — enough to justify the extra time and effort? That really comes down to your own unique situation. How often do you find yourself wanting to understand the inner workings of your code? How stable is your system? How often does something unexpected happen, and when it does, how long does it take you to recover? And how many different people are expected to support it day and night?

If your system is stable and things almost never happen that are mysterious or perplexing, then whatever tools you’re using now are probably good enough. Ultimately, this all comes down to the quality of life of the engineers who have to support the system. You can have a microservices system that takes very little operator time to run and maintain, and you can have a monolithic system that takes hours of developers’ time every week just trying to repro troublesome states, answer users’ questions, and figure out how things work in order to extend or improve the code.

But if you need greater visibility into your systems, if you need them to be tractable for larger numbers of people who rely on less tribal knowledge and need more accessible data, then it’s probably worth the investment into observability. No matter how boring the stack.

Have an observability question? Send Miss O11y an email!

* The LAMP stack is usually defined as: Linux, Apache (or Nginx), MySQL (or Postgres), and PHP (or Python, Perl, or Ruby).

Don’t forget to share!
Charity Majors

Charity Majors

CTO

Charity is an ops engineer and accidental startup founder at honeycomb.io. Before this she worked at Parse, Facebook, and Linden Lab on infrastructure and developer tools, and always seemed to wind up running the databases. She is the co-author of O’Reilly’s Database Reliability Engineering, and loves free speech, free software, and single malt scotch.

Related posts