Authors’ Cut—Structured Events Are the Basis of Observability

Author's Cut - Structured Events

6 Min. Read

At its core, observability is understanding the internal state of your systems based on the telemetry they output so you can effectively troubleshoot, debug, and tune performance. However, there’s a tendency to reduce observability to a collection of logs, metrics, and traces, which strips away much of the visibility you need to understand what’s going on. Instead, at Honeycomb, we see structured events as the building blocks of observability, with traces stitched together from the events so you can visualize trends and patterns.

In chapters 5, 6, and 7 of our new O’Reilly Book: Observability Engineering Achieving Production Excellence, we go deep on these concepts. And through our Authors’ Cut webinar series, Charity Majors, George Miranda, and I unpack the details of each chapter to connect the dots between the technical and business benefits of observability for modern systems.

This post walks through the topics from our inaugural webinar, including the shortcomings of most monitoring platforms, why the structured wide event is the only way to do observability, and how Honeycomb serves as a backend datastore that allows you to slice data by high-cardinality dimensions.

Would you rather have one useful tool or three crappy ones?

Observability is a hot topic. Under the guise of observability, many monitoring vendors have sold solutions that offer insights based on metrics, logs, and traces. In fact, you may already have separate tools in place for each of these data types, in the hope they deliver a holistic view of your system.

However, this approach falls short. Traditional monitoring based on metrics, logs, and traces was not designed to assess modern, complex systems that include microservices, multiple databases, and dynamic infrastructure.

  • Metrics are an aggregated measurement of what occurred at a particular time. Because each metric is a specific answer to a specific question, you can’t decompose it, add to it, or change it on the fly. This inflexibility robs you of necessary granularity and can cause you to miss valuable context.
  • Structured logs, similarly, fall short of providing enough information about events for an exploratory analysis. Many logs only deliver information about portions of events. Multiple log entries can be stitched together through a common field to better understand one event, but this isn’t typical or practical.
  • Finally, tracing is a series of interconnected logs and metrics that shows the journey of requests or actions as they move through your system. Tracing is a good thing; however, having to copy and paste error IDs from your logging tool into your tracing tool is not.

When engineering teams try to check the boxes of metrics, logs, and traces in the name of observability, they fall into a trap: paying to store their data three different times in three different ways. Not only that, but with traditional monitoring platforms, there’s also inconsistency between the data types. What should be one tool providing one data source has become three shittier versions of the same data.

Arbitrarily wide structured events leave context intact

Where does that leave you? Let’s get back to the fundamentals of observability—the arbitrarily wide structured event. These are the basic building blocks for understanding your systems, because they deliver the right level of granular data to debug any state of your application. An arbitrarily wide structured event comes with dozens to hundreds of dimensions per event that not only can be sliced based on various aspects, but can also be chained together to help uncover outliers and commonalities.

Compared to logging—where event details are scattered across multiple log lines—there is no guesswork as to whether events are temporarily or actually correlated because all the information came from the same data blob.

Honeycomb stores and queries your raw data for the full picture

Honeycomb is all about the arbitrarily wide structured event. When a request enters a service, Honeycomb initializes a map and pre-populates it with everything known or inferred about the request. When the request is ready to exit or error, Honeycomb ships it off in one arbitrarily wide structured event, typically 100 dimensions per event for a mature instrumented service.

Because Honeycomb is storing and querying your raw data, you won’t miss out on key information that could be abstracted away if you only view it as metrics and logs. You can always post-aggregate the events into time series.

This framework also enables distributed tracing, which tracks the progression of a single request as it’s handled by various services that make up an application. After all, an event can be transformed into a trace by storing trace, span, and parent identifiers as dimensions. Honeycomb leaves the context and granularity of your events intact so you can assemble a trace view to visualize log data and spot patterns. This is especially important for microservices that make it more challenging to pinpoint where failures occur along that route and what might be contributing to poor performance.

If you’re in the market for an observability product, make sure it’s based on arbitrarily wide structured events or spans. If it’s not, then it can’t slice data by high-cardinality dimensions—and it’s just metrics with a pretty hat on.

Get started with instrumentation using OpenTelemetry

Now, how to generate those building blocks? We recommend OpenTelemetry. By adding instrumentation code into your application, you can emit telemetry data alongside each request. Going with open standards like OpenTelemtry helps you avoid vendor lock-in and the instrumentation works with a wide variety of backend telemetry stores, including Honeycomb. We’re big fans of OpenTelemetry because it has lowered the barrier of entry for organizations instrumenting their code.

Learn more and try it out!

This was just the first session of our Authors’ Cut series, which celebrates the release of our O’Reilly Observability Engineering book! Check out the next sessions to learn more. We’ll discuss key concepts related to observability and how it all comes together in the Honeycomb platform with live demos. We’ll also have guest speakers from time to time.

If you want to give Honeycomb a try, sign up to get started.

Don’t forget to share!
Liz Fong-Jones

Liz Fong-Jones

Field CTO

Liz is a developer advocate, labor and ethics organizer, and Site Reliability Engineer (SRE) with 18+ years of experience. She is currently the Field CTO at Honeycomb, and previously was an SRE working on products ranging from the Google Cloud Load Balancer to Google Flights.

Related posts