Level Up With Derived Columns: Wibbly-Wobbly Timey-Wimey Manipulation

When we released derived columns last year, we already knew they were a powerful way to manipulate and explore data in Honeycomb, but we didn’t realize just how many different ways folks could use them. We use them all the time to improve our perspective when looking at data as we use Honeycomb internally, so we decided to share. So, in this series, Honeycombers share their favorite derived column use cases and explain how to achieve them.

This installment follows the previous post, “Better Math(s)

Have a favorite derived column use case of your own? Send us a screenshot and description and we’ll send you something in the mail 🙂

Make things convenient for yourself

Derived columns are the all-purpose data-massaging multitool of my Honeycomb dogfood experience. Sometimes I want to do something simple, like create a helpful boolean to separate the noise of our internal data from our customers’ data.

$is_honeycomber = CONTAINS($UserEmail, "@honeycomb.io")

And sometimes, I am just really, really sick of looking at time in milliseconds, and want to see the y-axis rescaled to days instead.

$duration_days = DIV($duration_ms, 86400000)

“What’s the big deal? I’m already instrumenting anyways, so it’s easy to go back and add some simple arithmetic in my code.”

The real question here is: how many times have you awkwardly realized days, weeks, or months after starting a project that it would be awfully convenient to emit a new event field that’s merely a function of existing fields?

At the beginning of Honeycomb, we started emitting two very useful event fields to dogfood: query.start_time and query.end_time, both formatted as relative time in seconds.

Time traveling instrumentation

True story: About two years later, I hop into the company and help build Honeycomb’s Fast Query Window. I want to investigate how customers will be affected by the changes, and part of this is seeing how large a time window our customers tend to query over — it would be really useful to graph a more human-readable representation of query windows. For example, the time duration in hours instead of in seconds.

query.window_size = (query.end_time - query.start_time) / 3600

Or equivalently in Derived Column notation:

$query.window_hours = DIV(SUB($query.end_time, $query.start_time), 3600)

With traditional instrumentation and monitoring, this is what would have happened.

drawing of timeline with realization of when it would have been good to have instrumentation
This is awkward because there are two years of juicy data where all the data you need in order to calculate window_hours is already logged — it’s just a simple function of start_time and end_time. But it was never explicitly instrumented.

Derived columns are sort of like ad-hoc time-traveling instrumentation. If the function inputs exist in your data, you can go back in time and query the function output as if it was always instrumented.

Image result for tardis math

$query.window_hours = DIV(SUB($query.end_time, $query.start_time), 3600)

 

drawing of going back and doing math to make up for missing instrumentation

Don’t sweat that awkward gap before you realized it would be more useful to instrument something in a different format. Save your dataset the confusion and use a derived column today!