How To Learn Systems Debugging by People-watching


When I first joined this startup that makes an observability platform, I was a front-end Javascript developer who had never ssh’ed into production–I didn’t even know what tracing or monitoring or metrics were, let alone what it meant for logs to be structured or how they could be useful to me. But within a couple months I joined the on-call rotation, and now share responsibility in our services along with the rest of my team.

Learn from your team’s debugging history

Your bash history is a private treasure trove, but when you spend a significant amount of time in other tools, that is where the expertise migrates. Because my team does the bulk of our systems debugging within Honeycomb, our Query History feed is like our shared bash history.

Sometimes even the people on my team are surprised when I remind them I learned to be on-call almost entirely by grazing on their querying history, by following their debugging trails to learn the ins and outs. Much like how you might try out a new programming language, fix a tricky cross-browser CSS bug, or try a wacky git trick by copying a code snippet from Stack Overflow, forking my teammates’ queries is how I learned to get around in our systems. My team unknowingly became co-authors of the most advanced and detailed practitioner’s textbook on systems observability via their collected Query History, and I used it!

Go back in time and see what happened

Honeycomb has always preserved Query History like an ambient Stack Overflow, but what I’m really excited about is that now there is a way to see and filter the overarching story of how your team inspects its code in production. Searchable Query History is like being able to ctrl-r/ctrl-s a visual, annotated representation of our entire team’s bash history through all time.

To access the brains of your colleaguesQuery History Search, click the stopwatch icon in the left nav and type in the search box:

screenshot of using Query History SearchWhen I see how my teammate watches her feature deploying, or how someone else monitors heatmaps of system stats as they make platform changes, I pick up tricks and new perspectives on how I can check up on my own code deploying and functioning in production. I notice how people monitor traffic flowing through the platform, how they zoom in on unusual fluctuations to determine if they’re normal variance or a sign of a bug, and build a personal understanding of what “normal” looks like without writing a single line of code or doing any drilling in myself.

I can now also filter-by-time to go back and replay how people solved past incidents, which will be useful for outage retrospectives. Getting a glimpse of people asking questions I couldn’t imagine seeds the ideas that I put into my debugging pocket for the moment when I need it most during a future incident.

Find the documentation for Query History Search here.

Want to find out how you can share your brain and understand what’s going on in production? Play with Honeycomb, no strings attached!