Getting Started With Honeycomb Metrics

12 minute read


Honeycomb is an event-based observability tool. Many Honeycomb customers use metrics along with their events, but the recommended usage and implementation choices in the product can be a bit disorienting to users new to observability. Honeycomb Metrics is not designed to work like traditional metrics tools; instead, Honeycomb takes a new approach to using metrics that’s compatible with modern observability-based debugging workflows.

Events and metrics are fundamentally different data types. Typically, that distinction keeps these data types living in different tools and data sets. The power of observability is being able to holistically understand both your applications and your systems without having to switch contexts—using different tools to analyze different parts of your stack is a relic of historical implementation details. Because events, as a data type, are better suited to debug issues at the application level, there are a few considerations to keep in mind if you’re migrating to observability from a more traditional world where application-level metrics help with most of your debugging. Metrics particularly shine for understanding system-level issues—the role for which they were invented. A common pattern when shifting to observability from traditional monitoring and APM workflows is that your reliance on metrics to find application issues typically diminishes while your reliance on them to surface system issues remains constant.

Honeycomb Metrics allows you to send metrics data to Honeycomb in their native format. Currently, that feature is only available to Enterprise customers. Non-enterprise customers can still send metrics into Honeycomb, but will need to convert metrics data into event data before doing so.

This whitepaper shows you how to use metrics with Honeycomb, both with and without the Enterprise metrics feature. First, it examines the role of metrics, how they’re used today, and how their usage should change when used in tandem with observability tools. It then briefly covers using Honeycomb Metrics to implement that change. For users without Honeycomb Metrics, this paper presents practical examples of which metrics to collect and includes code samples that show you how to manually send them to Honeycomb. Finally, this paper covers practices necessary to analyze metrics in an event-based context, along with tips to help ensure you can properly compare the two when manually sending us your metrics. When migrating toward observability from a metrics world, you will likely wind up replacing 95% of your metrics with Honeycomb events and keep the last 5% for use cases that are still best suited to metrics in today’s modern systems. This whitepaper shows you how.

Examining the Role of Metrics

Historically, a key component necessary for understanding the behavior of software running in production has been to also understand the behavior of its underlying hardware. Metrics began as low-level usage measurements of resources like disk space, memory, or CPU load. Over time, metrics evolved to include higher-order measurements reflecting the work being done by specific applications like requests per second, number of open threads, or connections to a load balancer. As such, metrics can generally be classified as either system-level metrics or application-level metrics.

System-level metrics

For decades, infrastructure metrics were tightly coupled with software behavior, largely due to a pattern of running monolithic systems on bare metal. The relationship between infrastructure performance issues and application performance issues was clearly correlated and relatively easy to dissect. When software issues occurred in production, it was practical to proceed down a list of known infrastructure failure modes and eliminate those as likely sources of error before looking for more elusive software issues. In the early days, it was more common for hardware problems to be the source of software problems in production.

As underlying physical hardware has become increasingly abstracted, examining a wide array of infrastructure metrics has proven less useful. In modern architectures, it’s more common for your code running at scale to be the source of problems. Even so, some infrastructure metrics still provide useful feedback about how your code behaves in production.

When it comes to infrastructure metrics, we’re typically less concerned about the hardware itself and more concerned about hitting physical resource limitations. If infrastructure metrics show that you’re running out of memory, you’re not going to drive to your cloud provider’s data center and install more RAM into the underlying machine. You’ll either migrate to another instance type with beefier memory or you’ll change your code to have less of an impact on the infrastructure.

In the examples above, metrics that expose infrastructure performance are the most obvious use of system-level metrics. But a more comprehensive way to think about “system-level” metrics is to also include things beyond infrastructure like, for example, reporting on the concurrent number of threads open in your application runtime. Like virtual infrastructure, that runtime is an underlying component that your application code needs in order to properly execute. System-level metrics are useful to quickly report on ambient constraints that impact the operation of your code.

Similarly, the infrastructure metrics to focus on are those that can impact how well your code runs in production. Resources like hostname, CPU, memory, and disk performance should be tracked using metrics. Additionally, you’ll likely want to track things like your cloud provider’s metadata if you’re running cloud instances. Beyond that, there aren’t many additional infrastructure metrics that will actually be useful to you in most scenarios. As a metrics user you probably already know that because system-level metrics don’t often comprise the bulk of metrics that many teams use to monitor production.

Application metrics

Application metrics tend to be much more useful in modern systems because they describe the performance characteristics of your code. They often include a very large number of measures that analyze various parts of what your applications are doing in production. Application metrics include measures like average page load times, average time for components to render, requests per second, peak response times, error rates, bytes delivered per page, latency between different system components, and many other similar examples.

When new production applications are deployed, they typically include a number of common metrics such as some of those listed in the example above—a base set of measurements indicative of common problems that might occur. Application metrics also tend to proliferate over time. When a new type of failure occurs in production, it’s not uncommon to see teams respond by monitoring conditions to look for that same type of failure in the future.

For example, when a production failure occurs because an underlying service dependency took too long to complete a request, a team might respond by adding an application metric that measures the latency between service weband its dependent service database. Later, when a similar failure is seen between service weband service auth, another metric measuring the latency between those two components is also added. And so on and so on.

Whenever teams want to get more granular views of software performance than they currently have, they must add additional metrics. They may have instrumented a webserver to measure the total number of incoming requests, but if they want to compare the number of requests going to their main site (www) from those going to their documentation site that uses a separate subdomain (docs), they would have to add two new metrics (one for wwwand one for docs, in addition to the one for all incoming requests if other subdomains exist). Every new attribute to be measured is a new dimension. Adding dimensions means adding more metrics.

Application metrics, rather than infrastructure metrics, make up the bulk of metrics most teams use to monitor the performance of their production applications. Moving to an event-oriented observability model is where our customers typically see the biggest reduction in their use of metrics.

Events are unaggregated application metrics

Rather than tracking ever-greater numbers of metrics, Honeycomb is based on the idea of analyzing high-context events in any way you see fit.

Using events, a webserver request could be instrumented to record each parameter submitted with the request (for example, user id), the intended subdomain (www, docs, support, shop, cdn, etc.), total duration time, any dependent child requests, the various services involved (web, database, auth, billing) to fulfill those child requests, the duration of each child request, and more. Those events can then be arbitrarily sliced and diced in various ways, across any window of time, to present any view relevant to your investigation.

For application-level instrumentation, events become a much more attractive option than metrics due to their flexibility and wide context.

Metrics reflect system state, as expressed by numerical values derived by measuring any given property over a given period of time. For example, a metric for page_load_timemight examine the average time it took for all active pages to load during the trailing 5s period. Metrics aggregate values over a defined period.

Events, in contrast, are snapshots of what happened at a particular point in time. One thousand discrete events may have occurred within that same trailing 5s period in the example above. If each event recorded its own page_load_time,you could still display the same average value as shown by the metric when aggregated along with the other 999 events that happened in that same 5s period. However, when using events, you could also subdivide queries by fields like user_idor hostnameto find correlations with page_load_timein ways that simply aren’t possible when using the aggregated value provided by metrics.

When shifting to observability, teams will often replace a vast majority of their application metrics with widely instrumented events. As a result, over time, we frequently see customers end up replacing about 95% of their metrics.

Metrics aren’t replaced because the numbers they generate no longer provide value. Metrics are replaced because they are too rigid and limiting. As aggregate numerical representations of predefined relationships over predefined periods of time, they only serve one narrow view of one system property. Events, by contrast, still allow you to aggregate measures and calculate the value those metrics provide, as needed. In addition, events also offer a far greater degree of flexibility.

In an event-based system, metrics are replaced because their value diminishes significantly.

Let’s dig into that a bit further to examine how that last 5% of metrics are used in the world of observability.

Events and infrastructure metrics

There are still some classes of instrumentation where it makes sense to continue to use metrics. The use case for metrics is best categorized as workloads where there is high throughput and low differentiation. For example, watching the performance of a network switch or a router is better suited to using metrics instead of event-based instrumentation.

Although modern virtualized and complex infrastructure platforms have significantly reduced their value, infrastructure metrics (like CPU, memory, and disk performance) should still be collected because they help you manage ambient resources that affect the performance of your production applications.

In observability debugging workflows, once a team identifies an issue, they typically take a look at ambient metrics to either quickly confirm or rule out a system-level issue. If the issue is not occuring due to system-level constraints being encountered, then they’ll dig into the task of investigating more complex application-level issues using event data.

At Honeycomb, we actively use infrastructure metrics and events to understand how our own production systems are running. We track metrics on infrastructure systems such as Kafka to understand capacity and usage. We also collect cloud infrastructure metadata that AWS publishes about its services. Virtually everything else

Download the PDF to read more Download PDF