Level Up With Derived Columns: Basic Comparisons


When we released derived columns last year, we already knew they were a powerful way to manipulate and explore data in Honeycomb, but we didn’t realize just how many different ways folks could use them. We use them all the time to improve our perspective when looking at data as we use Honeycomb internally, so we decided to share. So, in this series, Honeycombers share their favorite derived column use cases and explain how to achieve them. Welcome to the first installment!

bee sudoku board

Have a favorite derived column use case of your own? Send us a screenshot and description and we’ll send you something in the mail 🙂

Using derived columns to partition and compare data

We love talking about using Honeycomb for software engineer use cases — and my favorite (hidden-in-plain-sight, stealth high-cardinality) attribute to break down by is build_id. Honeycomb makes it fast and painless to see trends broken down per-build, and it’s often useful to track behavior in prod as the result of individual deploys:

screenshot showing break down by build_ID

But, with a fairly active continuous deployment setup, this query can actually result in a fairly noisy graph! And, ultimately, we don’t really care about the behavior per-build, I just want to know whether my build (7868) caused any changes I should worry about.

Fortunately, I can define a derived column named after_build_7868 that lets me compare events from before my build against just those from after my build was deployed:

# Define new after_build_7868 column as:
GTE($build_id, "7868")

I’m referencing the build_id column (remember: think of derived columns as letting you generate new columns within an existing event!), then using the GTE operator to return a boolean for Honeycomb to break down into groups.

And now, breaking down that noisy graph gets me something much simpler:

screenshot showing breakdown by values before and after a given build_ID

What else can you compare?

This simple partition technique also works great for comparing graphs for, say, an arbitrary canary host against the larger cluster. Simply define an is_canary column as the operation EQ($hostname, "canary-hostname") and break down by your new is_canary column. Pair this technique with our Query Template Links and supercharge your canary deploys today! 🎉

Intrigued? Come back later for more posts in this derived columns series, read more in our documentation, or sign up and give Honeycomb a try for yourself!