Software Engineering   Observability   Dogfooding  

Level Up With Derived Columns: Bucketing Events For Comparison

By Sam Stokes  |   Last modified on July 21, 2022

When we released derived columns last year, we already knew they were a powerful way to manipulate and explore data in Honeycomb, but we didn’t realize just how many different ways folks could use them. We use them all the time to improve our perspective when looking at data as we use Honeycomb internally, so we decided to share. So, in this series, Honeycombers share their favorite derived column use cases and explain how to achieve them.

Have a favorite derived column use case of your own? Send us a screenshot and description and we’ll send you something in the mail 🙂

animated bee with a bucket

A very useful observability technique is to take a graph and break it down by some property of the underlying events. You get a graph for each value of that property, overlaid for easy comparison.

To make that more concrete, let's use an example. If you're looking at a spike in your service's response time, you might try:

  • breaking down the response time graph by server hostname to see if the spike is isolated to one host;
  • breaking down by the version of the running code, in case you deployed a new version that hurt performance; or even
  • breaking down by user ID, to see if the slowdown affects all users or is specific to certain use cases.

If the response time spike only shows up for one server, or just for certain versions, or for some of your users but not others, that gives you valuable information about where to look for the source of the problem.

Too many graphs

Once you get used to using breakdowns for investigation, you start getting more ideas for properties it would be useful to break down by. Unfortunately, it's easy to hit a snag when the property varies too much, and you end up trying to compare hundreds of graphs! But there's a way out: you can use a derived column to partition the values of the property into a manageable number of buckets.

For example, let's say users can upload files to your app, and you know that some files are much bigger than others. You might want to ask whether users upload more small files than large, or whether large uploads fail more often.

HTTP requests include a useful Content-Length header, which would record the size of the file a user is uploading. If you've already instrumented your app to send along the HTTP request headers as properties of the event, then you might try a break down by headers.Content-Length. Unfortunately, file sizes vary too much! You'll get a breakdown graph for *every* distinct file size. If one file was 5MB and another was 5.1MB, they'll each show up in a different breakdown group.

The problem is our breakdown is too precise. We don't actually care about comparing 5MB uploads with 5.1MB uploads; we just care about comparing really big uploads with really small uploads. To get a more useful answer, we need to ask a less precise question.

It's raining buckets

To fix this, let's use a derived column to divide file sizes into a few _buckets_. We can use a series of nested IF operators and comparisons to define the buckets:

IF(LT($headers.Content-Length, 1000), "0:bytes",
   IF(LT($headers.Content-Length, 1000000), "1:kbytes",
      IF(LT($headers.Content-Length, 1000000000), "2:mbytes",

That's a little hard to read because of the nesting, but we're defining a derived column which evaluates to one of those four strings - "0:bytes", "1:kbytes", "2:mbytes", "3:huge" - depending on the value of the headers.Content-Length field. We can choose any names we want for the buckets, but I've chosen to name each bucket with a numeric prefix so that we can sort them in the results table.

Let's call this derived column content_length_bucket. Now instead of breaking down by headers.Content-Length, we can break down by content_length_bucket, and get a much more manageable four graphs to compare.

How do we choose good bucket sizes? Here I relied on knowledge about what sort of files people might upload - multi-megabyte JPEGs are likely to behave differently from code snippets that might be a few hundred bytes, and anything a gigabyte or bigger is probably going to need special handling (or be rejected). If you're not sure what to expect, a good rule of thumb is either exponentially sized buckets like these for widely disparate values, or identically sized buckets (e.g. 100, 200, 300) for more uniformly distributed data.You can also ask the data! Querying COUNT BREAK DOWN BY content_length_bucket will show you if you have one bucket with the vast majority of the events, in which case you might want to divide that bucket further.

3 bees with multiple buckets

Other ways to get less precise

Once you start thinking with breakdowns, this pattern comes up quite a lot: your events have a field containing useful information you'd like to break down by, but the field is too specific. Here are some other questions you might want to ask, and derived columns that can help:

  • Is this error happening to all our customers, or just to internal staff? The UserEmail field is different for every user, so let's break down by CONTAINS($UserEmail, "").
  • Do we have more users on Firefox or Chrome? The headers.User-Agent field is notoriously messy, but we can use a regex to pull out more useful information: break down by REG_VALUE($headers.User-Agent, `([^\s\\["/]+)(/\d+\.\d+)?"]$`).
    (Note that parsing user-agent strings is a rather involved topic and the example regex above is far from perfect!)
  • What is the behaviour before and after this build rolled out? We wrote earlier in this series about how to partition and compare data!

Intrigued? Come back later for more posts in this derived columns series, read more in our documentation, or sign up and give Honeycomb a try for yourself!


Related Posts

Observability   News & Announcements  

Announcing Honeycomb Support Business Hours in Europe

Earlier this year, Honeycomb announced the launch of data residency in Europe. To meet the growing needs of our customers in the region, we are...

Software Engineering   Culture  

Establishing and Enabling a Center of Production Excellence

Software is in a crisis. This is nothing new. Complex distributed systems are perpetually in a state far from equilibrium, operating in what Richard Cook...

Observability   Metrics  

The Cost Crisis in Metrics Tooling

In my February 2024 piece The Cost Crisis in Observability Tooling, I explained why the cost of tools built atop the three pillars of metrics,...