Honeycomb Blog

Honeycombers at LISA 2017

Did you go to LISA this year? I used to go back in the 1998-2003 timeframe (anyone remember playing the original Guitar Hero in that huge arcade in Seattle?) and I hope to make it back again someday soon. A lot of time has passed since those days, but the conference continues to offer attendees a wide range of useful and educational talks to choose from. In particular, the content on operating at scale has evolved upward much like the definition of β€œLarge” since the conference’s inception πŸ™‚ A couple of Honeycombers presented at LISA this year–here’s what they talked…

Read More...

Debug Better By Throwing Information Away

The Addiction Like many developers in today’s Brave New Distributed World, I’ve started to develop an addiction lately: I’m addicted to data. Data, whether it’s small or big or consultant big, is a critical make-or-break factor for businesses today. Once you figure out that you can store and analyze every interaction on the website or happening on your servers, it seems to be only a matter of collecting all the right details and turning the proper knobs to grow your app and ensure your status among the unicorns. It therefore wouldn’t surprise me if the idea of losing some of…

Read More...

Dynamic Sampling in Honeytail

A while ago I wrote a three part series on sampling, covering an introduction, some simple straight forward ways to do it, and some ideas for fancy implementations. I’m happy to say that that work has made its way in to Honeytail, our log tailing agent. Dynamic sampling in Honeytail works with a two phase algorithm – it measures the frequency of values in one or more columns for 30 seconds, computes appropriate sample rates for each value based on trying to fit a logarithmic curve to the traffic, then uses those values for the following 30 seconds. While it’s…

Read More...

Instrumenting High Volume Services: Part 3

This is the last of three posts focusing on sampling as a part of your toolbox for handling services that generate large amounts of instrumentation data. The first one was an introduction to sampling and the second described simple methods to explore dynamic sampling. In part 2, we explored partitioning events based on HTTP response codes, and assigning sample rates to each response code. That worked because of the small key space of HTTP status codes and because it’s known that errors are less frequent than successes. What do you do when the key space is too large to easily…

Read More...

Instrumenting High Volume Services: Part 2

This is the second of three posts focusing on sampling as a part of your toolbox for handling services that generate large amounts of instrumentation data. The first one was an introduction to sampling. Sampling is a simple concept for capturing useful information about a large quantity of data, but can manifest in many different ways, varying widely in complexity. Here in Part 2, we’ll explore techniques to handle simple variations in your data, introduce the concept of dynamic sampling, and begin addressing some of the harder questions in Part 1. Constant Sampling This code should look familiar from Part…

Read More...

Instrumenting High Volume Services: Part 1

This is the first of three posts focusing on sampling as a part of your toolbox for handling services that generate large amounts of instrumentation data. Recording tons of data about every request coming in to your service is easy when you have very little traffic. As your service scales, the impact of measuring its performance can cause its own problems. There are three main ways to mitigate this problem: measure fewer things aggregate your measurements before submitting them before submitting them measure a representative portion of your traffic Each method has its place; this series of posts focuses on…

Read More...