BLOG

Never Alone On Call

Does your organization have an on-call rotation? Several members of the Honeycomb engineering team recently hosted a live webcast about why they never feel alone when on-call at Honeycomb. Wait, that’s someone else’s job…

Treading in Haunted Graveyards

Part 1: CI/CD for Infrastructure as Code At Honeycomb, we’ve often discussed the value of making software deployments early and often, and being able to understand your code as it runs in production. However,…

Toward a Maturity Model for Observability

Access to observability is becoming critical to organizations shipping software, running modern infrastructures in production, and to understanding how users are experiencing their service. To achieve success in delivering a complex service, it’s no…

A New Bee’s First Oncall

I’m Honeycomb’s newest engineer, now on my eighth week at Honeycomb. Excitingly, I did my first week of oncall two weeks ago! Almost every engineer at Honeycomb participates in oncall, and I chose to…

Notes from On-call Adjacency

I’ve never been on-call, but I’ve been on-call adjacent for a lot of my adult life—my partners, my housemates, my friends…they’ve largely been sysadmins, in Operations, or Dev/Ops, which means I’ve experienced a lot…

Postmortem: RDS Clogs & Cache-Refresh Crash Loops

On Thursday, October 4, we experienced a partial API outage from 21:02-21:56 UTC (14:02-14:56 PDT). Despite some remediation work, we saw a similar (though less serious) incident again on Thursday October 11 from 15:00-16:02 UTC (8:00-9:02PDT). To implement a more permanent fix, we scheduled an emergency maintenance window which completely interrupted service on Friday Oct 12 for approximately two minutes, from 4:38-4:40 UTC (Thursday Oct 11, 21:38-21:40 PDT).