Frontend Observability: A Candid Conversation With Emily Nakashima and Charity Majors
Frontend development has evolved rapidly over the past decade, but one challenge remains constant: understanding what’s happening in real-time across diverse browsers, environments, and user...
Always. Enable. Keepalives.
As part of our recent failure testing project, we ran into an interesting failure mode involving the OpenTelemetry SDK for Go. In this post, we’ll...
What Is Full-Stack Observability?
Simply put, full-stack observability is monitoring designed for modern, cloud-native architectures. It allows you to understand how your software system interacts at scale, across everything...
Aligning Business and Engineering Goals with Honeycomb SLOs
Setting clear, measurable goals is essential for any successful team. However, aligning those goals with the technical work can be challenging in the fast-paced world...
A CoPE’s Guide to Alert Management
Alerts are a perennial topic, and a CoPE will need to engage with them. The bounds of this problem space are formed by two types...
The CoPE and Other Teams, Part 2: Custom Instrumentation and Telemetry Pipelines
The previous post laid out the basic idea of instrumentation and how OpenTelemetry’s auto-instrumentation can get teams started. However, you can’t rely only on auto-instrumentation....
Deploying the OpenTelemetry Collector to AKS
The Collector is the focal point for telemetry inside your cluster. Instead of your containerized applications sending directly to your OpenTelemetry-capable backend (the place that...
Apdex in Honeycomb
“How is my app performing?” is one of the most common, yet hardest questions to answer. There are myriad ways to measure this, like error...
Making Room for Some Lint
It’s one of my strongly held beliefs that errors are constructed, not discovered. However we frame an incident’s causes, contributing factors, and context ends up...
The CoPE and Other Teams, Part 1: Introduction & Auto-Instrumentation
The CoPE is made to affect, meaning change, how things work. The disruption it produces is a feature, not a bug. That disruption pushes things...
Destroy on Friday: The Big Day 🧨 A Chaos Engineering Experiment - Part 2
In my last blog post, I explained why we decided to destroy one third of our infrastructure in production just to see what would happen....
What Makes for a 'Good' Pair Programming Session?
Software changes so rapidly that developing on the cutting edge of it cannot fall to a single person. When it comes to asynchronously disseminating information...
Deploy on Friday? How About Destroy on Friday! A Chaos Engineering Experiment - Part 1
We recently took a daring step to test and improve the reliability of the Honeycomb service: we abruptly destroyed one third of the infrastructure in...