Honeycomb launches new Private Cloud, revamped Metrics capabilities, and Canvas in GA.Learn more

The "Meh-trics" Reloaded: Why I Was 100% Wrong About Metrics (and Also 100% Right)

Somewhere in here is the number that tells me why my checkout flow keeps breaking.

| November 19, 2025
Somewhere in here is the number that tells me why my checkout flow keeps breaking

Okay, I'm going to say something that would make 2016 Charity want to throw her laptop across the room: we're making a major investment in metrics at Honeycomb.

I know, I know. "But Charity, you literally called them ‘shit salad!’" I did. Also "nerfed dimensions." I said they would "fucking kneecap you." For most of the past decade, I've been social media’s most reliable anti-metrics evangelist. Have I repented? No.

When it comes to the ways people are using and abusing metrics in their pursuit of observability today, I stand by every inflammatory thing I’ve ever said. But when it comes to the full spectrum of what engineering teams need? I wasn’t seeing the whole picture.

New to Honeycomb? Get your free account today.

Get access to distributed tracing, BubbleUp, triggers, and more.

Up to 20 million events per month included.

What the hell are metrics, anyway?

The word “metrics” is itself a huge source of confusion. Imagine yourself eating lunch in the busy cafeteria at a tech company, hearing scraps of chatter like these:

  • "...so then the PM asked me what metrics we're tracking and I just blanked..."
  • "...engagement metrics are tanking but leadership doesn't seem to care..."
  • "...yeah but vanity metrics don't actually tell us if users are happy..."
  • "...spent all morning in a meeting about which metrics to dashboard..."
  • "...our metrics look great but the product still feels broken to me..."
  • "...turns out we were measuring the wrong metrics this whole time..."
  • “...someone added ‘user_id’ as a metric, which cost us $80k overnight…”
  • "...honestly I don't trust any of these metrics after the data pipeline broke..."
  • "...classic case of optimizing for metrics instead of outcomes..."

Most of these people are using “metrics” as a generic synonym for data, or results, or “really important numbers that executives look at”. Let’s call this definition “small-m metrics.”

But there is also a data type called “metric,” which has a specific technical definition. The, let’s call it “big-M Metric,” consists of a number, a timestamp, and optionally some associated tags or dimensions, designed for efficient storage in a time-series databases.

Small-m metrics vs big-M Metrics

How and why did we start using “metrics” to refer to any and all types of telemetry data? I’m guessing it’s because big-M Metrics have been the dominant data type for telemetry across the past twenty, thirty, forty years of computing, because they have always been so small and efficient. Just a number and a timestamp, optionally some tags; it’s hard to get much smaller than that.

For decades now, big-M Metrics have been the workhorse of application telemetry, and every other signal type—logs, traces, exceptions, errors, profiling— has been relegated to the margins.

Infrastructure metrics vs application metrics

The definition of infrastructure I like best is that infrastructure is the code you have to run in order to run the code you want to run, meaning your code. Infrastructure metrics are being continuously emitted from your operating system, your databases, your networking devices, your Kubernetes pods, and so on. Infrastructure metrics tell you, "Is this thing healthy?" in aggregate, from the perspective of the device; things like CPU usage, memory consumption, network throughput. They're standardized, they come from drop-in Collectors, and they're not directly tied to any specific customer's experience. Infra can be in an unhealthy state while customer experience is fine, and customer experience can be terrible while infra lights are glowing green. To some extent, these are separate concerns.

Application metrics, on the other hand (small-m metrics!), include things like checkout latency, time-to-first-token, or login success rate. They're directly connected to something a real person is trying to do right now. They don't happen every minute on the minute. They happen when they happen, because they're tied to actual transactions. And critically, they preserve all the rich connective tissue that big-M Metrics have to discard: traces, spans, all the dimensional data that lets you understand why something is slow or broken.

The part where I was 100% right

When I wrote about "meh-trics," I was talking about something very specific: using big-M Metrics as your primary mode of understanding complex distributed systems. On that point I was right, and yes I will die on this hill.

Metrics are sparse signals stripped of context

By design, they aggregate away the details. That P99 latency number on your dashboard? It can't tell you if the problem is affecting your biggest customer or a test account. It can't show you whether it's related to a specific feature flag, a particular build, or users in a specific region. You've taken something as rich and complex as a transaction flowing through your system, and reduced it to a bucket in a time-series.

This is fine for, "Is my disk about to fill up?" but it's catastrophic for, "Why is checkout broken for users in Canada on Android 12 who signed up after Tuesday?"

You can't ask questions you didn't predict

Traditional metrics require you to know in advance what you want to track. If you didn't instrument for it, too bad. The unknown-unknowns that kill you in production? Metrics won't help you there. You need to be able to slice and dice your data by dimensions you never thought to pre-aggregate. For example, you may have added individual metrics to count total number of HTTP 5xx errors, count HTTP 5xx errors for android devices, count HTTP 5xx errors for <url>, count HTTP 5xx errors for <specific destination>, but if you missed creating a metric that combines all of them together, then you can’t ask the question.

Metrics tools suck at high cardinality

While traditional metrics tools are not optimized for true observability, they’re often the primary tools we have access to, so they get used anyway. When people try to customize metrics to make them actually useful for application-level debugging, they end up doing hilariously expensive things. Like instrumenting user IDs or session IDs as tags, creating hundreds of thousands of unique time-series, and watching their bills explode. I've heard horror stories of individual metrics costing $30,000 per month. That's not a tool working as intended; that's a tool being tortured into doing something it was never designed for.

The highest-value metrics are the ones that tell you about actual customer experience. And they need to be linked to traces and events. They need context. They need to answer why. Our customers understand this. For example, the Intercom teams responsible for Fin aren’t just watching aggregate latency metrics. They use SLOs that connect back to the underlying trace data. When something looks off, they can immediately understand what's causing it and which customers are affected. And when they experiment, they can link specific A/B tests to their impact on SLOs!

This is also the reason why Honeycomb’s Canvas and MCP have proven to be so much more capable than others out of the box. I’m not going to name names here, but we’ve all seen the product announcements in the last 12 months and the flashy demos that look incredible until you realize that the demo scenarios are powered by data that real customers just aren’t capturing. When AI needs to investigate a problem, just like a human, it needs context. It needs the full story, not just a number on a dashboard. That full story is dependent on support for high cardinality and the ability to connect metrics to transactions.

The part where I was 100% wrong

So far, so good. I was right that your data can do more for you than traditional approaches to Metrics have offered. Here’s the thing though: even as we’ve grown as a company, and seen our philosophy of observability gain traction, we find that customers still want metrics. And they want them to work how traditional metrics work.

While our focus on wide events enables us to deliver actionable insights that you simply can’t get from any other platform, it’s also true that our lack of support for metrics has cost us opportunities and that not being able to meet customers where they were has made it more difficult to get them to where they could be.

In that spirit, I’ll own up to something I’ve had to rethink.

Infrastructure metrics are still vital

Not interesting, not sexy, not what I want to spend my time thinking about, but vital. CPU usage, memory consumption, disk I/O, these are the table stakes for understanding whether your platforms are healthy. While monitoring is not observability, monitoring is still a legitimate operational need. And pre-aggregated time-series data is still the most cost effective way to capture infrastructure metrics at scale.

Metrics offer a standardized lingua franca

When you spin up a new database or deploy a new service, there are OOTB metrics that Just Work™. They're the common language of infrastructure health. Every tool knows how to collect them. That maturity and ubiquity? It's valuable. Really valuable.

Learning from our customers

I’ll concede that big-M Metrics do have value, and that our choice to focus on the wide event has cost us opportunities. But that isn’t a bad thing in itself. Any decision to build anything always comes at the cost of not building something else. We never wanted Honeycomb to be all things to all people. So why are we revisiting metrics now?

Recently, I’ve seen an interesting usage pattern from our customers that tipped me off to the possibilities offered by beefing up our metrics offering. On one particular call, a senior site reliability engineer was showing off a set of metrics dashboards he’d made in another tool. “I need to know all of this, and I need to see it all at once. But,” he said, “as soon as I see red anywhere here, I jump over to Honeycomb to figure out what’s going on.”

Think about that workflow. They're using other tools for initial monitoring and alerting. But the moment they find something unusual, something that needs actual investigation, they come to Honeycomb. They jump from their metrics dashboard to our trace-based investigation tool because that's where they can actually understand what's happening.

This is simultaneously validating (Yes! You need rich, contextual data to debug!) and humbling (But you're still using multiple tools when you shouldn't have to).

The opportunity

Tool consolidation

What if we could collapse that workflow? What if the alert and the investigation happened in the same place? Imagine getting an alert about high error rates, and in that alert, you already have AI-generated analysis showing you it's specifically affecting API calls from a particular customer segment, with recommendations for what to check next. Not "here's a graph that's red," but "here's what's wrong, here's who's affected, here's where to look."

That’s the power of unified observability: combining infrastructure metrics for broad platform health monitoring with Honeycomb's investigation capabilities, all in one place. Add in Honeycomb Intelligence for automated analysis, and you have something powerful.

Tool consolidation isn't just about saving money. It's about avoiding momentum-killing context switching and keeping the cognitive load manageable when you're trying to debug a production incident at 2 a.m.

Better instrumentation for better outcomes

Here's where it gets interesting. With Honeycomb, you can instrument infrastructure metrics using OpenTelemetry Collectors and you can capture application metrics as attributes of the wide event. The same instrumentation that gives you rich traces also gives you the measurements that matter. You save money because you're not paying twice for similar data. And you get better quality data because application metrics remain connected to their full context.

What should you actually do?

Stop trying to use big-M Metrics tools for custom application metrics. Those metrics that measure checkout latency by user segment? They should be span attributes in your traces, not custom metrics that cost $63,000/month.

Use metrics for what they're good at

Mostly, that’s infrastructure and platform health. Use events and traces for everything related to understanding your application and debugging customer impact.

Move your alerting workflows and investigations to the same platform

The mental overhead of jumping between tools is real. The time spent correlating data by eyeballing timestamps is wasted time. If you're getting alerted in one tool and then opening Honeycomb to investigate, you're creating unnecessary friction.

Think about your telemetry in layers. Infrastructure metrics at the bottom: standardized, cheap, focused on platform health. Application data in the middle and top: rich, contextual, connected to actual user transactions. Match the right tool to the right layer.

Why this matters now

The complexity of distributed systems keeps growing, and traditional observability tools multiply already-exploding costs by charging you for data three times—once for each “pillar” of observability. On top of that, AI is changing how we interact with our telemetry data, but only if that data has the context AI needs to be useful.

Honeycomb Metrics isn't about abandoning our principles. It's about acknowledging that we were solving for one use case—deep, contextual investigation of unknown-unknowns—while ignoring another legitimate use case: basic operational hygiene.

The industry is consolidating. Teams are tired of stitching together five different tools. They want infrastructure monitoring and deep observability in one place—and honestly? They should have that.

I spent years telling people metrics suck. And for observability use cases, they do. But for their intended purpose—monitoring the health of infrastructure components—they're the right tool. I was so focused on the revolution that I dismissed the boring operational stuff that keeps the lights on.

I'm not apologizing for pushing the industry toward better observability practices. But I do acknowledge that "better" doesn't mean "only." It means having the right tool for each job, and ideally, having all those tools work together seamlessly.

Infrastructure metrics aren't going away, but they shouldn't be your primary interface for understanding what's happening in your application code. Use metrics where they shine. Use events and traces where they shine. And for the love of god, stop trying to make metrics do things they were never designed to do.

I was right about observability needing more than metrics. I was wrong about metrics having no place in our toolkit. Both things can be true. Now let's go build something that actually works for real engineering teams dealing with real production systems.