Honeycomb Blog

How Honeycomb Uses Honeycomb, Part 9: Tracing The Query Path

This post continues our long-running dogfooding series from How Honeycomb Uses Honeycomb Part 8: A Bee’s Life. To understand how Honeycomb uses Honeycomb at a high level, check out our dogfooding blog posts first — they do a great job of telling the story of problems we’ve solved with Honeycomb. Last week we announced the general availability of tracing in Honeycomb. We’ve been dogfooding this feature extensively as part of an ongoing effort to keep Honeycomb fast and reliable. In this post, we’ll discuss how we use tracing internally, some of the ways it’s helped us improve our service, and some of the…

Read More...

RDS Performance Degradation – Postmortem

Summary Note: all times are UTC unless otherwise noted. On Thursday, May 3 starting at 00:39:08 UTC (Wednesday 17:39 PDT) we experienced a nearly complete outage of the Honeycomb service lasting for approximately 24 minutes. Most services came back online at 2018-05-03 01:02:49 and all customer-facing services were fully restored by 01:07:00. Impact During the outage, customers were not able to log in or view graphs on the Honeycomb service. Events sent to the Honeycomb API during the outage were refused at elevated rates; approximately 86% of API submissions during the outage were not accepted, and approximately 81% of events…

Read More...

Diving into Kubernetes clusters with Honeycomb

At Honeycomb, we’re excited about Kubernetes. In fact, we’re in the early stages of moving some of our services to k8s. Tools like kops have made getting started with k8s easier than ever. But building clusters is only the beginning – before long you might find yourself with a large number of deployments, pods, and services, and new things coming on line every week. Observability is critical to cluster operations. Fortunately, Honeycomb provides multiple Kubernetes integrations to help you get started exploring your cluster’s events and metrics. What can we do with Kubernetes data in Honeycomb? Let’s look at a…

Read More...

Sam Stokes talks about data infrastructure on the Data Engineering Podcast

This past week, Honeycomb engineering manager Sam Stokes was interviewed on the Data Engineering Podcast, and in addition to hearing him talk a little about himself (which as far as I can tell, he almost never does) I thought you might want to hear all about Honeycomb’s data infrastructure in Sam’s voice (which is extremely soothing) as well: Listen to or download the podcast here. In addition to talking about the characteristics of our event data, Sam describes how we leverage our own use of Honeycomb to support and analyze our customer usage rapidly and at scale, by slicing and…

Read More...

How Honeycomb Uses Honeycomb Part 8: A Bee’s Life

This post continues our dogfooding series from How Honeycomb Uses Honeycomb, Part 7: Measure twice, cut once: How we made our queries 50% faster…with data. To understand how Honeycomb uses Honeycomb at a high level, check out our dogfooding blog posts first — they do a better job of telling the story of problems we’ve solved with Honeycomb. This blog post peeks under the hood to go into greater detail around the mechanics of what we track, how we track it all, and how we think about the sorts of questions we want to answer. We’ve built up a culture…

Read More...

How Honeycomb Uses Honeycomb, Part 7: Measure twice, cut once: How we made our queries 50% faster…with data

This post continues our dogfooding series from How Honeycomb Uses Honeycomb, Part 6: Instrumenting a Production Service . The entire value proposition of Honeycomb’s columnar store is speed of queries, for instance: Examining over 1.2 million rows and returning 50 time series, aggregating results into 240 (1 minute granularity over 4 hours for this query) buckets per series, in 330ms. Nice. This is just the time spent on our storage nodes, though. It doesn’t include the amount of time it takes for the browser to download and render the results, and (more importantly for this post) it doesn’t include the…

Read More...

How Honeycomb Uses Honeycomb, Part 6: Instrumenting a Production Service

This post continues our dogfooding series from How Honeycomb Uses Honeycomb, Part 5: The Correlations Are Not What They Seem. In a recent blog post, I talked about what sorts of things should go into an event. That was a lovely and generic list. This post grounds it in our reality – I’m here to share with you how we configure libhoney-go and talk about the instrumentation we add in our own production services. The service I’m using as an example here is our API for receiving events from the wild. It’s an HTTP server, and for the most part,…

Read More...

How Honeycomb Uses Honeycomb, Part 5: The Correlations Are Not What They Seem

This post continues our dogfooding series from How Honeycomb Uses Honeycomb, Part 4: Check Before You Change. Maybe you’ve heard the saying that correlation does not imply causation: just because two things changed in the same way at the same time doesn’t prove one of the changes caused the other one. And just like pretty much everything worthwhile, there’s a relevant xkcd strip: It’s important to separate correlation and causation because there’s a natural human tendency to look for causation everywhere. Causation is a powerful part of the way we understand the world. Causation tells stories, and we’re good at…

Read More...

How Honeycomb Uses Honeycomb, Part 4: Check Before You Change

This post continues our dogfooding series from How Honeycomb Uses Honeycomb, Part 3: End-to-end Failures. As Honeycomb matures, we try to roll out changes as smoothly as possible to minimize surprise on the part of our customers. Part of that relies upon understanding, intimately, the effect of a change and its potential user impact. We made a couple small changes to our API recently, and were able to use our dogfood cluster to make informed decisions about the planned changes. Episode 1: This change is obviously good, right? Our API accepts flat JSON objects: a single map with keys and…

Read More...

How Honeycomb Uses Honeycomb, Part 3: End-to-End Failures

This post continues our dogfooding series from How Honeycomb Uses Honeycomb, Part 2: Migrating API Versions. At Honeycomb, one of our foremost concerns (in our product as well as our customers’) is reliability. To that end, we have end-to-end (e2e) checks that run each minute, write a single data point to Honeycomb, then expect that specific data point to be available for reads within a certain time threshold. If this fails, the check will retry up to 30 times before giving up. Monday morning, we were notified of some intermittent, then persistent, errors in these automated checks. We quickly verified…

Read More...