Honeycomb Blog

Testing In Production

Testing in production has gotten a bad rap — despite the fact that we all do it, all the time. There’s a lot of value in testing: to a point. But if you can catch 80-90% of the bugs with 10-20% of the effort — and you can! – the rest is more usefully poured…
Read More...

Observability: What’s in a Name?

“Is observability just monitoring with another name?” “Observability: we changed the word because developers don’t like monitoring.” There’s been a lot of hilarious snark about this lately. Which is great, who doesn’t love A+ snark? Figured I’d take the time to answer, at least once. Yes, in practice, the tools and practices for monitoring vs…
Read More...

Our First Outage

Dear honeycomb users, On Saturday, Aug 19th, we experienced a service outage for all customers. This was our first-ever outage, even though we’ve had users in production for almost exactly one year, and paying customers for about 6 months. We’re pretty proud of that, but also overdue for an outage. We take production reliability very…
Read More...

Lies My Parents Told Me (About Logs)

Lots of us still believe some pretty silly things about logs. Most of these things used to be true! Some of them never really were. Sometimes they are “true enough” to get you a long ways, until you run into a wall and suddenly they no longer are. Any time there are changes in your…
Read More...

Instrumenting High Volume Services: Part 3

This is the last of three posts focusing on sampling as a part of your toolbox for handling services that generate large amounts of instrumentation data. The first one was an introduction to sampling and the second described simple methods to explore dynamic sampling. In part 2, we explored partitioning events based on HTTP response…
Read More...

Is Honeycomb a monitoring tool?

You may notice that we don’t talk about “monitoring” much, and that’s because we don’t really think of monitoring as what we do, even though it kind of is. Traditional monitoring relies heavily on predicting how a system may fail and checking for those failures. Traditional graphing involves generating big grids of dashboards that sit…
Read More...

Instrumentation: system calls: an amazing interface for instrumentation

When you’re debugging, there are two basic ways you can poke at something. You can: create new instrumentation (like “adding print statements”) use existing instrumentation (“look at print statements you already added”, “use Wireshark”) When your program is already running and already doing some TERRIBLE THING YOU DO NOT WANT, it is very nice to…
Read More...

Instrumentation: What does ‘uptime’ mean?

This is the second post in our second week on instrumentation. Want more? Check out the other posts in this series. Ping Julia or Charity with feedback! Everybody talks about uptime, and any SLA you have probably guarantees some degree of availability. But what does it really mean, and how do you measure it? If…
Read More...

Instrumentation: Instrumenting HTTP Services

Welcome to the second week of our blog post series on instrumentation, curated by Julia and Charity. This week will focus more on operational and practical examples; check out previous entries for awesome posts on Finite State Machines, The First Four Things You Measure, and more! Instrumenting HTTP Services I spend most of my time…
Read More...

Instrumentation: Worst case performance matters

This is the fifth in a series of guest posts about instrumentation. Like it? Check out the other posts in this series. Ping Julia or Charity with feedback! BrightRoll’s Realtime team is responsible for a service, Vulcan, which provides a set of developer-friendly APIs for interacting with the data involved in deciding whether to serve…
Read More...