Ask Miss O11y: Observability vs BI Tools & Data Warehouses
By Charity Majors | Last modified on June 17, 2022You probably have already answered this before, but do you have a good rule of thumb for where o11y [observability] ends and BI [business intelligence]/data warehouses begin?
Yes! While data is data (and tools exist on a continuum, and can and often are reused or repurposed to answer questions outside their natural domain), observability and BI/data warehouses typically exist on opposite ends of the spectrum in terms of time, speed, and accuracy, among others.Â
It can be really hard to generalize about âbusiness intelligence toolsââa quick glance on the Internet turns up everything from online analytical processing (OLAP), mobile BI, real-time BI, operational BI, collaborative BI, location intelligence, data visualization and chart mapping, tools for building dashboards, billing systems, ad hoc analysis and querying, enterprise reporting ... you name the problem, thereâs a tool somewhere optimized to analyze it. (It is only somewhat easier to generalize about the data warehouses that power them, but at least we can say those are non-volatile and time-variant, and contain raw data, metadata, and summary data.)
So anything we say to generalize is only going to be 90% true. But that never stopped Miss O11y! Letâs kick it.
Query execution time
Observability tools need to be fast, with queries ranging from sub-second to low-seconds. A key tenet of observability is explorabilityâthe fact that you donât always know what youâre looking for. You spend less time running the same queries over and over, and more time following a trail of breadcrumbs. When youâre in a state of flow, trying to understand and explore the consequences of your code in production, itâs incredibly disruptive to have to sit there and wait for a minute or longer to get results. You can lose your whole train of thought!
BI tools, on the other hand, are often about running reports, or crafting a complex query that will be used again and again. Itâs okay if these take longer to run, because you arenât trying to use this data to react in real time, but rather to feed into other tools or system. You typically make decisions about steering the business over units of days, weeks, months or years, not minutes or secondsâand if youâre updating those decisions every few seconds, something has gone terribly wrong.
(Please note that one of the umpteen categories of BI tools is called âExploratory Data Analysisâ [EDA], which specializes in flexible, rapid exploration over sampled dataâmuch like observability. The difference between observability and EDA tooling is that the latter typically focuses on helping you join across multiple tables, while observability tools are highly opinionated about data structures like traces.)
Accuracy
For observability tools, âfast and close to right is better than perfectâ is the law of the highway (as well as being one of our company values đ). You would almost always rather get a result that scans 99.5% of the events in one second than a result that scans 100% in one minute. Which is a very real, very common tradeoff that you have to make with massively parallelized distributed systems across flaky networks.
Also, some form of dynamic sampling is often employed to achieve observability at scale, in order to manage cost while capturing enormously detailed traces about important code paths. Sampling and âclose to rightâ are verboten for data warehouses and BI tools. When it comes to billing, for example, you will always want the accurate result no matter how long it takes.
Recency
The questions you answer with observability tools have a strong recency bias, and the most important data is often the freshest. A delay of more than a few seconds between when something happened in production and when you can query for those results is unacceptable, especially when youâre dealing with an incident.
As data fades into months past, you tend to care about what happened more in terms of aggregates and trends than specific requests, and when you do care about specific requests, itâs fine for it to take a bit longer to find them. But when data is fresh, you need those results to be raw, rich, and up-to-the-second current.
BI tools typically exist on the other end of the continuum, on the âitâs fine for it to take a bit longerâ side. While there is often some ability to cache more recent results, and pre-process, index or aggregate older data, you want to retain the full fidelity of the data forever. You would never use an observability tool to find something that happened five years ago, or even two years ago, while warehouses are designed to store that data forever (and grow infinitely).
Structure
True observability is built out of arbitrarily wide structured data blobs, one event per request per service (or per polling interval in long-running batch processes). In order to answer any question about whatâs happening at any time, you need to incentivize developers to append more details to the event anytime they spy something that might be relevant in the future. Defining a schema upfront would defeat that purpose, therefore schemas can only be inferred after the fact (and changed on the fly, just start sending a dimension or stop sending it at any time). Indexes are similarly unhelpful. Thatâs called picking and choosing in advance which questions you can ask efficiently, when the answer has to be âany of them.â
BI tools, on the other hand, often collect and process large amounts of unstructured data into structured, queryable form, while data warehouses would be an ungovernable mess without structures and schemas. You need consistent schemas in order to perform any kind of useful analysis over time. And you tend to ask similar questions in repeatable ways to power dashboards and the like, so you can optimize them with indexes, compound indexes, summaries, etc.
Because data warehouses grow forever, it is very important that they have predefined schemas and grow at a predictable rate. o11y, on the other hand, is all about rapid feedback loops and flexibility. It is most important under times of stress or duress, when predictability goes out the window.
Monitoring!
Related to the last couple of points: debug data is inherently more ephemeral than business data. You might very well need to retrieve a specific transaction record or billing record from two years ago with total precision, whereas you are unlikely to need to know if the latency between service1 and service2 was high for a particular user request two years ago.
You may, however, want to know if the latency between service1 and service2 has increased over the last year or two, or if the 95% percentile has gone up over that time. This type of question is a very common one, and it is best served not by BI/warehouses or observability tools, but by our good old pal monitoring.
Monitoring tools donât store raw request data, like observability tools do, but they do allow you to quickly and cheaply perform aggregates and counters on the fly. Monitoring tools (from rrdtool to Prometheus) are also excellent at aging out detail so that historical data can accumulate while only ever occupying a fixed amount of storage. You can store high level aggregates by the year, somewhat more detailed aggregates by the month, week, and day. Thatâs literally what itâs designed for, and what itâs best at.
In conclusion
Observability is a specialized use case for people who write and ship software to understand that software in production. It requires you to ingest telemetry in a specialized way; to make a set of tradeoffs on the storage side that are unlike those of any other use case; and to optimize the user interface for explorability, rich context correlation, and outlier detection. And itâs fast.
But the best way to tell if what youâre using is observability or BI is this. If you want to know with great precision what is happening to your users with your code, in production, right now, and you can reliably answer your own questions even if the scenarios are new to you? Then congratulations, you have excellent observability.
If you have to wait a few minutes, or an hour, or days, weeks, or months to find out? Then youâre bending a BI or logging tool towards observability purposes. (At best.)
But Iâm guessing you knew that. âşď¸
Related Posts
Ask Miss O11y: To Metric or to Trace?
Dear Miss O11y, I remember reading quite interesting opinions from you about usage of metrics and traces in an application. Did you elaborate on those...
Ask Miss O11y: Is There a Beginnerâs Guide On How to Add Observability to Your Applications?
Dear Miss O11y, I want to make my microservices more observable. Currently, I only have logs. Iâll add metrics soon, but Iâm not really sure...
Ask Miss O11y: Error: missing âx-honeycomb-datasetâ header
Your API Key (in the x-honeycomb-team header) tells Honeycomb where to put your data. It specifies a team and an environment. Then, Honeycomb figures out...