Results for 'incident review'

Anatomy of a Cascading Failure

In Caches Are Good, Except When They Are Bad, we identified four separate problems that combined together to cause a cascading failure in our API servers. This followup post goes over them in detail,…

Never Alone On Call

Does your organization have an on-call rotation? Several members of the Honeycomb engineering team recently hosted a live webcast about why they never feel alone when on-call at Honeycomb. Wait, that’s someone else’s job…

Notes from Observability Roundtables

The Velocity conference happened recently, and as part of it we (Honeycomb) hosted a sort of reverse-panel discussion, where you talked, and we listened. You may be aware that we’re in the process of…

A New Bee’s First Oncall

I’m Honeycomb’s newest engineer, now on my eighth week at Honeycomb. Excitingly, I did my first week of oncall two weeks ago! Almost every engineer at Honeycomb participates in oncall, and I chose to…

Notes from On-call Adjacency

I’ve never been on-call, but I’ve been on-call adjacent for a lot of my adult life—my partners, my housemates, my friends…they’ve largely been sysadmins, in Operations, or Dev/Ops, which means I’ve experienced a lot…