BLOG

Toward a Maturity Model for Observability

Access to observability is becoming critical to organizations shipping software, running modern infrastructures in production, and to understanding how users are experiencing their service. To achieve success in delivering a complex service, it’s no…

A New Bee’s First Oncall

I’m Honeycomb’s newest engineer, now on my eighth week at Honeycomb. Excitingly, I did my first week of oncall two weeks ago! Almost every engineer at Honeycomb participates in oncall, and I chose to…

Notes from On-call Adjacency

I’ve never been on-call, but I’ve been on-call adjacent for a lot of my adult life—my partners, my housemates, my friends…they’ve largely been sysadmins, in Operations, or Dev/Ops, which means I’ve experienced a lot…

Postmortem: RDS Clogs & Cache-Refresh Crash Loops

On Thursday, October 4, we experienced a partial API outage from 21:02-21:56 UTC (14:02-14:56 PDT). Despite some remediation work, we saw a similar (though less serious) incident again on Thursday October 11 from 15:00-16:02 UTC (8:00-9:02PDT). To implement a more permanent fix, we scheduled an emergency maintenance window which completely interrupted service on Friday Oct 12 for approximately two minutes, from 4:38-4:40 UTC (Thursday Oct 11, 21:38-21:40 PDT).

Honeycombers at LISA 2017

Did you go to LISA this year? I used to go back in the 1998-2003 timeframe (anyone remember playing the original Guitar Hero in that huge arcade in Seattle?) and I hope to make…