BLOG

Making Instrumentation Extensible

Observability-driven development requires both rich query capabilities and sufficient instrumentation in order to capture the nuances of developers’ intention and useful dimensions of cardinality. When our systems are running in containers, we need an…

Toward a Maturity Model for Observability

Access to observability is becoming critical to organizations shipping software, running modern infrastructures in production, and to understanding how users are experiencing their service. To achieve success in delivering a complex service, it’s no…

Dynamic Sampling by Example

Last week, Rachel published a guide describing the advantages of dynamic sampling. In it, we discussed varying sample rates to achieve a target collection rate overall, and having different sample rates for distinct kinds…

Anatomy of a Cascading Failure

In Caches Are Good, Except When They Are Bad, we identified four separate problems that combined together to cause a cascading failure in our API servers. This followup post goes over them in detail,…

A New Bee’s First Oncall

I’m Honeycomb’s newest engineer, now on my eighth week at Honeycomb. Excitingly, I did my first week of oncall two weeks ago! Almost every engineer at Honeycomb participates in oncall, and I chose to…

How To Learn Systems Debugging by People-watching

When I first joined this startup that makes an observability platform, I was a front-end Javascript developer who had never ssh’ed into production–I didn’t even know what tracing or monitoring or metrics were, let…

Notes from On-call Adjacency

I’ve never been on-call, but I’ve been on-call adjacent for a lot of my adult life—my partners, my housemates, my friends…they’ve largely been sysadmins, in Operations, or Dev/Ops, which means I’ve experienced a lot…