Level Up With Derived Columns: Bucketing Events For ComparisonBy Sam Stokes | Last modified on July 21, 2022
When we released derived columns last year, we already knew they were a powerful way to manipulate and explore data in Honeycomb, but we didn’t realize just how many different ways folks could use them. We use them all the time to improve our perspective when looking at data as we use Honeycomb internally, so we decided to share. So, in this series, Honeycombers share their favorite derived column use cases and explain how to achieve them.
Have a favorite derived column use case of your own? Send us a screenshot and description and we’ll send you something in the mail 🙂
A very useful observability technique is to take a graph and break it down by some property of the underlying events. You get a graph for each value of that property, overlaid for easy comparison.
To make that more concrete, let's use an example. If you're looking at a spike in your service's response time, you might try:
- breaking down the response time graph by server hostname to see if the spike is isolated to one host;
- breaking down by the version of the running code, in case you deployed a new version that hurt performance; or even
- breaking down by user ID, to see if the slowdown affects all users or is specific to certain use cases.
If the response time spike only shows up for one server, or just for certain versions, or for some of your users but not others, that gives you valuable information about where to look for the source of the problem.
Too many graphs
Once you get used to using breakdowns for investigation, you start getting more ideas for properties it would be useful to break down by. Unfortunately, it's easy to hit a snag when the property varies too much, and you end up trying to compare hundreds of graphs! But there's a way out: you can use a derived column to partition the values of the property into a manageable number of buckets.
For example, let's say users can upload files to your app, and you know that some files are much bigger than others. You might want to ask whether users upload more small files than large, or whether large uploads fail more often.
HTTP requests include a useful
Content-Length header, which would record the size of the file a user is uploading. If you've already instrumented your app to send along the HTTP request headers as properties of the event, then you might try a break down by
headers.Content-Length. Unfortunately, file sizes vary too much! You'll get a breakdown graph for *every* distinct file size. If one file was 5MB and another was 5.1MB, they'll each show up in a different breakdown group.
The problem is our breakdown is too precise. We don't actually care about comparing 5MB uploads with 5.1MB uploads; we just care about comparing really big uploads with really small uploads. To get a more useful answer, we need to ask a less precise question.
It's raining buckets
IF(LT($headers.Content-Length, 1000), "0:bytes",
IF(LT($headers.Content-Length, 1000000), "1:kbytes",
IF(LT($headers.Content-Length, 1000000000), "2:mbytes",
That's a little hard to read because of the nesting, but we're defining a derived column which evaluates to one of those four strings -
"3:huge" - depending on the value of the
headers.Content-Length field. We can choose any names we want for the buckets, but I've chosen to name each bucket with a numeric prefix so that we can sort them in the results table.
Let's call this derived column
content_length_bucket. Now instead of breaking down by
headers.Content-Length, we can break down by
content_length_bucket, and get a much more manageable four graphs to compare.
How do we choose good bucket sizes? Here I relied on knowledge about what sort of files people might upload - multi-megabyte JPEGs are likely to behave differently from code snippets that might be a few hundred bytes, and anything a gigabyte or bigger is probably going to need special handling (or be rejected). If you're not sure what to expect, a good rule of thumb is either exponentially sized buckets like these for widely disparate values, or identically sized buckets (e.g. 100, 200, 300) for more uniformly distributed data.You can also ask the data! Querying
COUNT BREAK DOWN BY content_length_bucket will show you if you have one bucket with the vast majority of the events, in which case you might want to divide that bucket further.
Other ways to get less precise
Once you start thinking with breakdowns, this pattern comes up quite a lot: your events have a field containing useful information you'd like to break down by, but the field is too specific. Here are some other questions you might want to ask, and derived columns that can help:
- Is this error happening to all our customers, or just to internal staff? The
UserEmailfield is different for every user, so let's break down by
- Do we have more users on Firefox or Chrome? The
headers.User-Agentfield is notoriously messy, but we can use a regex to pull out more useful information: break down by
(Note that parsing user-agent strings is a rather involved topic and the example regex above is far from perfect!)
- What is the behaviour before and after this build rolled out? We wrote earlier in this series about how to partition and compare data!
Software systems are increasingly complex. Applications can no longer simply be understood by examining their source code or relying on traditional monitoring methods. The interplay...
Stop me if you’ve heard this one before: you just pushed and deployed your latest change to production, and it’s rolling out to your Kubernetes...
Kubernetes has been around for nearly 10 years now. In the past five years, we’ve seen a drastic increase in adoption by engineering teams of...