Blog

Category: Dogfooding

Product Updates   Dogfooding  

Honeycomb SLO Now Generally Available: Success, Defined.

Honeycomb now offers SLOs, aka Service Level Objectives. This is the second in a set of of essays on creating SLOs from first principles. Previously,...

Software Engineering   Product Updates   Dogfooding  

From "Secondary Storage" To Just "Storage": A Tale of Lambdas, LZ4, and Garbage Collection

When we introduced Secondary Storage two years ago, it was a deliberate compromise between economy and performance. Compared to Honeycomb’s primary NVMe storage attached to...

Software Engineering   Incident Response   Dogfooding   Debugging  

Incident Report: Running Dry on Memory Without Noticing

On November 6, 2019, we intermittently rejected 1-3% of customer telemetry data at ingest for four periods of 20 minutes each. The trigger of the...

Software Engineering   Product Updates   Dogfooding  

Working Toward Service Level Objectives (SLOs), Part 1

In theory, Honeycomb is always up. Our servers run without hiccups, our user interface loads rapidly and is highly responsive, and our query engine is...

Software Engineering   Operations   Dogfooding  

Never Alone On Call

Does your organization have an on-call rotation? Several members of the Honeycomb engineering team recently hosted a live webcast about why they never feel alone...

Dogfooding   Debugging  

All Together Now: Better Debugging With Multiple Visualizations

"Nines don't matter when users aren't happy" is something you may have heard a time or two from folks here at Honeycomb. We often emphasize...

Operations   Logging   Dogfooding  

Understand Your AWS Cost & Usage with Honeycomb

First published in August 2019. AWS bills are notoriously complicated, and the Amazon Cost Explorer doesn’t always make it easy to understand exactly where your...

Operations   Dogfooding   Debugging  

Treading in Haunted Graveyards

Part 1: CI/CD for Infrastructure as Code At Honeycomb, we've often discussed the value of making software deployments early and often, and being able to...

Operations   Dogfooding  

Incident Review: You Can't Deploy Binaries That Don't Exist

Between 22:50 and 22:54 UTC on July 9, our capacity to accept traffic to api.honeycomb.io gradually diminished until all incoming requests started to fail. 8...

Operations   Dogfooding  

Automating Collection of Troubleshooting Data with Triggers: a How-To Guide

Everyone wants to be more efficient -- to spend less time on the tedious things, and more time on the things that move the needle....

Dogfooding   Debugging   Databases  

Stop Your Database From Hating You With This One Weird Trick

Let's not bury the lede here: we use Observability-Driven Development at Honeycomb to identify and prevent DB load issues. Like every online service, we experience...

Software Engineering   Dogfooding  

Anatomy of a Cascading Failure

In Caches Are Good, Except When They Are Bad, we identified four separate problems that combined together to cause a cascading failure in our API...

Tracing   Software Engineering   Dogfooding  

When In Doubt, Add More Spans: A Tale of Tracing and Testing In Production

Recently, Toshok was telling a story about the kind of thing he talks about a lot—improving the performance of some endpoint or page or other....

Software Engineering   Dogfooding  

Incident Review: Caches are Good, Except When They Are Bad

Between Wednesday, April 17th and Friday, April 26th, Honeycomb had four separate periods of downtime affecting the Honeycomb API, resulting in approximately 38 minutes of...

Software Engineering   Operations   Dogfooding   Debugging  

A New Bee's First Oncall

I'm Honeycomb's newest engineer, now on my eighth week at Honeycomb. Excitingly, I did my first week of oncall two weeks ago! Almost every engineer...

1 2 3 4