Heatmaps Make Ops Better

In this blog miniseries, I’d like to talk about how to think about doing data analysis “the Honeycomb way.”  Welcome to part 1, where I cover what a heatmap is—and how using them can really level up your ability to understand what’s going on with distributed software.

Heatmaps are a vital tool for software owners: if you’re going to look at a lot of data, then you need to be able to summarize it without losing detail. When I’m dealing with data, I find it really helpful to be able to see precisely what data points make up the result of a query, and to zoom down to see individual events. Heatmaps, in their way, best represent what Honeycomb is really about: re-displaying events back at you.

I’m not the first to say that: Chris Toshok wrote a great blog entry on heatmaps a year ago when we first launched them. Heatmaps are no longer the New Hotness. They’re just a critical part of understanding your system with Honeycomb.

Unfortunately, the word hasn’t spread as widely as it could yet. Heatmaps can be difficult to understand. I want to talk about the real power of what makes a heatmap special, and how to use it — and why devops really shouldn’t walk out the door without one.

Why heatmaps? 

We all, I think, know what a scatterplot is. Given a number of multi-dimensional events, pick two continuous fields and place them on the X and Y axes. A reasonable choice might be to pick, say, the time of day on the X axis, and the duration of a given event on Y. With a few hundred points, it’s easy enough to read—but any noise in the data can hide the effect that’s in the data.
screenshot of a scatter plot We can learn more by drawing more and more points. But sometimes a scatterplot gets crowded. Draw a few thousand points, and they start to run into each other. I think this image shows that there’s an upper group and a lower group, but it’s hard to tell how they are different.

screenshot of a crowded scatter plot

There are a few options to address this problem of overplotting. One classic strategy, especially for timelines, is to summarize and aggregate the data. By grouping the the x-axis into bins, the visualization can choose an aggregate across the measure and draw it as a continuous line. In this chart, the average, minimum, and maximum show the general trends of the data — but at great cost. There’s almost no detail visible.