Honeycomb Blog

Deploying the OpenTelemetry Collector to AKS

August 7, 2024

Deploying the OpenTelemetry Collector to AKS

The Collector is the focal point for telemetry inside your cluster. Instead of your containerized applications sending directly to your OpenTelemetry-capable backend (the place that allows you to ask questions of your telemetry), we send that data to an internal location first, then forward the data on.

OpenTelemetry

Tutorials

August 5, 2024

Max Aguirre

Apdex in Honeycomb

“How is my app performing?” is one of the most common, yet hardest questions to answer. There are myriad ways to measure this, like error rate, average response time, and so on. Enter the Application Performance Index (aka Apdex), a single metric that attempts to answer, “Are my application’s users happy?”

Observability

Software Engineering

July 29, 2024

Fred Hebert

Making Room for Some Lint

It’s one of my strongly held beliefs that errors are constructed, not discovered. However we frame an incident’s causes, contributing factors, and context ends up influencing the shape of the corrective items (if any) that get created. I’ll cover these ideas by using our June 3rd incident where a database migration caused a large outage by locking up a shared database and making it run out of connections.

July 25, 2024

The CoPE and Other Teams, Part 1: Introduction & Auto-Instrumentation

The CoPE is made to affect, meaning change, how things work. The disruption it produces is a feature, not a bug. That disruption pushes things away from a locally optimal, comfortable state that generates diminishing returns. It sets things on a course of exploration to find new terrains which may benefit it more—and for longer.

Culture

Software Engineering

Destroy on Friday: The Big Day 🧨 A Chaos Engineering Experiment – Part 2

July 23, 2024

Lex Neva

Destroy on Friday: The Big Day 🧨 A Chaos Engineering Experiment – Part 2

In my last blog post, I explained why we decided to destroy one third of our infrastructure in production just to see what would happen. This is part two, where I go over the big day. How did our chaos engineering experiment go? Find out below!

Dogfooding

Software Engineering

What Makes for a ‘Good’ Pair Programming Session?

July 18, 2024

Ruthie Irvin

What Makes for a ‘Good’ Pair Programming Session?

Software changes so rapidly that developing on the cutting edge of it cannot fall to a single person. When it comes to asynchronously disseminating information about projects, code comments, PR conversations, Slack, RFCs, and other investigatory documents do a wonderful job, but no amount of async communication replaces the magic of two brains bouncing ideas off of each other.

Culture

Software Engineering

Teams & Collaboration

😫 Tired: Deploy on Friday / 🤯 Wired: Destroy on Friday! A Chaos Engineering Experiment – Part 1

July 16, 2024

Lex Neva

Deploy on Friday? How About Destroy on Friday! A Chaos Engineering Experiment – Part 1

We recently took a daring step to test and improve the reliability of the Honeycomb service: we abruptly destroyed one third of the infrastructure in our production environment using AWS’s Fault Injection Service. You might be wondering why the heck we did something so drastic. In this post, we’ll go over why we did it and how we made sure that it wouldn’t impact our service.

Dogfooding

Software Engineering

July 10, 2024

Nick Travaglini

Staffing Up Your CoPE

Getting the right people working in the CoPE is crucial to success because these change agents must limber up the organization and promote the flexibility necessary to perform resilience.

Culture

Software Engineering

Why Every Engineering Team Should Embrace AWS Graviton4

July 9, 2024

Liz Fong-Jones

Why Every Engineering Team Should Embrace AWS Graviton4

Two years ago, we shared our experiences with adopting AWS Graviton3 and our enthusiasm for the future of AWS Graviton and Arm. Once again, we’re privileged to share our experiences as a launch customer of the Amazon EC2 R8g instances powered by AWS Graviton4, the newest generation of AWS Graviton processors.

Dogfooding

Featured

The Hater’s Guide to Dealing with Generative AI

July 3, 2024

Tyler Wilson

The Hater’s Guide to Dealing with Generative AI

Generative AI is having a bit of a moment—well, maybe more than just a bit. It’s an exciting time to be alive for a lot of people. But what if you see stories detailing a six month old AI firm with no revenue seeking a $2 billion valuation and feel something other than excitement in the pit of your stomach?

Conferences & Meetups

AI & LLMs

Navigating Software Engineering Complexity With Observability

June 27, 2024

Rox Williams

Navigating Software Engineering Complexity With Observability

In the not-too-distant past, building software was relatively straightforward. The simplicity of LAMP stacks, Rails, and other well-defined web frameworks provided a stable foundation. Issues were isolated, systems failed in predictable ways, and engineers had time to innovate on new features for the business. And it was good.

Observability

Software Engineering

OpenTelemetry Best Practices #3: Data Prep and Cleansing

June 24, 2024

Martin Thwaites

OpenTelemetry Best Practices #3: Data Prep and Cleansing

Having telemetry is all well and good—amazing, in fact. It’s easy to do: add some OpenTelemetry auto-instrumentation libraries to your stack and they’ll fill your disks with data pretty quickly. However, having good telemetry data—data that’s curated into being useful—is something that is both cost-effective and represents good value.

OpenTelemetry

Love our content?

Get it delivered straight to your inbox.

By subscribing to our newsletter, you agree to Honeycomb’s Terms of Service and Privacy Notice.

Framework for an Observability Maturity Model

June 14, 2024

Liz Fong-Jones

Framework for an Observability Maturity Model: Using Observability to Advance Your Engineering & Product

Everyone’s talking about “observability,” but many don’t know what it is, what it’s for, or what benefits it offers. With this framing of observability in terms of goals instead of tools, we hope teams will have better language for improving what their organization delivers and how they deliver it.

Observability

Software Engineering

Investigating Mysterious Kafka Broker I/O When Using Confluent Tiered Storage

June 7, 2024

Terra Field

Investigating Mysterious Kafka Broker I/O When Using Confluent Tiered Storage

Earlier this year, we upgraded from Confluent Platform 7.0.10 to 7.6.0. While the upgrade went smoothly, there was one thing that was different from previous upgrades: due to changes in the metadata format for Confluent’s Tiered Storage feature, all of our tiered storage metadata files had to be converted to a newer format.

Software Engineering

Independent, Involved, Informed, and Informative: The Characteristics of a CoPE

May 29, 2024

Nick Travaglini

Independent, Involved, Informed, and Informative: The Characteristics of a CoPE

In part one of our CoPE series, we analogized the CoPE with safety departments. David Woods says that those safety departments must be: independent, involved, informed, informative. In this post, we’ll elaborate on what each of those characteristics means, why the CoPE should also match those qualifications, and how to achieve that status.

Culture

Software Engineering

May 28, 2024

Hazel Edmands

Virtualizing Our Storage Engine

Our storage engine, affectionately known as Retriever, has served us faithfully since the earliest days of Honeycomb. It’s a tool that writes data to disk and reads it back in a way that’s optimized for the time series-based queries our UI and API makes. Its architecture has remained mostly stable through some major shifts in the surrounding system it supports, notably including our 2021 implementation of a new data model for environments and services. As usage of this feature has grown, however, we’ve noticed Retriever creaking in novel ways, pushing us to reconsider a core architectural choice.

Databases

Dogfooding

Announcing Honeycomb Support Business Hours in Europe

May 17, 2024

Jessica Nunn

Announcing Honeycomb Support Business Hours in Europe

Earlier this year, Honeycomb announced the launch of data residency in Europe. To meet the growing needs of our customers in the region, we are delighted to announce new Honeycomb Support business hours.

News & Announcements

Observability

Establishing and Enabling a Center of Production Excellence

May 15, 2024

Nick Travaglini

Establishing and Enabling a Center of Production Excellence

Software is in a crisis. This is nothing new. Complex distributed systems are perpetually in a state far from equilibrium, operating in what Richard Cook has called a “degraded mode.” It’s through a combination of technical artifacts, organizational practices and policies, and pure gumption that they manage to maintain themselves through time.

Culture

Software Engineering

The Cost Crisis in Metrics Tooling: Whitepaper Excerpt

May 13, 2024

Charity Majors

The Cost Crisis in Metrics Tooling

In my February 2024 piece The Cost Crisis in Observability Tooling, I explained why the cost of tools built atop the three pillars of metrics, logs, and traces—observability 1.0 tooling—is not only soaring at a rate many times higher than your traffic increases, but has also become radically disconnected from the value those tools can deliver. Too often, as costs go up, the value you derive from these tools declines.

Metrics

Observability

May 8, 2024

Rox Williams

Observability, Telemetry, and Monitoring: Learn About the Differences

Over the past five years, software and systems have become increasingly complex and challenging for teams to understand. A challenging macroeconomic environment, the rise of generative AI, and further advancements in cloud computing compound the problems faced by many organizations. Simply understanding what’s broken is difficult enough, but trying to do so while balancing the need to constantly innovate and ship makes the problem worse. Your end users have options, and if your software systems are unreliable, they’ll choose a different one.

Observability

OpenTelemetry

Explore Blog

Deploying the OpenTelemetry Collector to AKS

Apdex in Honeycomb

Making Room for Some Lint

The CoPE and Other Teams, Part 1: Introduction & Auto-Instrumentation

Destroy on Friday: The Big Day 🧨 A Chaos Engineering Experiment – Part 2

What Makes for a ‘Good’ Pair Programming Session?

Deploy on Friday? How About Destroy on Friday! A Chaos Engineering Experiment – Part 1

Staffing Up Your CoPE

Why Every Engineering Team Should Embrace AWS Graviton4

The Hater’s Guide to Dealing with Generative AI

Navigating Software Engineering Complexity With Observability

OpenTelemetry Best Practices #3: Data Prep and Cleansing

Love our content?

Framework for an Observability Maturity Model: Using Observability to Advance Your Engineering & Product

Investigating Mysterious Kafka Broker I/O When Using Confluent Tiered Storage

Independent, Involved, Informed, and Informative: The Characteristics of a CoPE

Virtualizing Our Storage Engine

Announcing Honeycomb Support Business Hours in Europe

Establishing and Enabling a Center of Production Excellence

The Cost Crisis in Metrics Tooling

Observability, Telemetry, and Monitoring: Learn About the Differences