Save the date for O11yCon 2026Get Early Access

Honeycomb Metrics Is Now Generally Available

The tradeoff between data completeness and cost has been a feature of metrics platforms for as long as they’ve been around. And as AI becomes central to how engineering teams build and operate software, it is about to become a much more expensive problem.

Honeycomb Metrics Is Now Generally Available

It’s Black Friday. Checkout latency is spiking. Your on-call engineer pulls up the dashboard and starts working through the list.

Is it a regional issue? No, all regions look fine. A payment provider? Stripe, PayPal, Apple Pay all nominal. A bad deployment? Nothing shipped in the last six hours. All your infrastructure dashboards are showing green.

But customers are complaining. Checkout is slow, carts are being abandoned and revenue is draining away. Something is wrong, but the tools can’t see it.

Hours later, you figure out the problem by sifting through logs. A single discount code is triggering a failing database lookup. One code out of hundreds is affected. Because discount codes have hundreds of unique values, which causes high cardinality, your observability team never implemented them as a tag in the monitoring stack. The detail that would have solved the problem in two minutes was discarded before it was even recorded.

The tradeoff between data completeness and cost has been a feature of metrics platforms for as long as they’ve been around. And as AI becomes central to how engineering teams build and operate software, it is about to become a much more expensive problem.

Two models for metrics

Time series data has some critical advantages when it comes to collecting and storing infrastructure metrics. Store data as pre-aggregated series, one per unique combination of metric name and tags. Query it fast. Retain it cheaply. For a handful of dimensions like region, environment, and host type, the math works fine.

The problem is the assumption baked into that architecture: that you know in advance what questions you’ll need to ask. Every dimension you might ever want to query has to be declared before the data is collected. The moment you add a high-cardinality tag like user_id, promo_code, or session_id, the number of timeseries multiplies accordingly. Add a high-cardinality dimension like user_id across a million users to a metric you were previously tracking by three regions and four environments, and you’ve gone from twelve time series to twelve million. And while this will blow up your bill, it will also slow down or even crash your queries.

The events model takes a different approach. Instead of pre-aggregating before storage, you store each request as a complete structured document, every field intact, and run the aggregation at query time. You can break down by any dimension, filter on any field, and ask questions you never anticipated when you first instrumented the code. The cost is flat per event, not per unique tag combination. Cardinality does not affect cost.

Neither model is universally better. Time series genuinely excels at long-term retention of low-cardinality infrastructure metrics: CPU, memory, disk, network. For that use case, it is fast, cheap, and mature. The mistake is expecting the time series model on its own to solve all of your metrics problems.

Learn more about Honeycomb Metrics

Schedule your custom demo with one of our experts today.

AI is increasing the pressure. Big surprise.

Engineering teams have been living with this tradeoff for years, supplementing and compensating with human judgment. When the metrics were ambiguous, experienced engineers filled in the gaps: they knew about the deployment last Tuesday, the upstream dependency that was flaky, the traffic pattern that only showed up during promotional events. Metrics gave a low-resolution picture, but engineers supplied the context.

AI agents can’t do that. They have no company history, no intuition, no memory of the deployment last Tuesday. They need the full context to be present in the data. With low granularity pre-aggregated metrics as their fuel, even the smartest and most powerful agentic models will be limited to giving you better alerts. The data required to tell you why things are happening just isn’t there.

At the same time, agents do something humans cannot: they can programmatically query thousands of dimensions in seconds. An on-call engineer checks five or ten dimensions before giving up and trying something else. An AI agent can fan out across thousands of dimensions in parallel, identifying the signal that no human would have had the patience to find. But it can only do this if those dimensions are captured and available to query at speed.

Another source of pressure on traditional metrics tools is the new category of telemetry that AI systems generate. Model ID, tokens per request, input length, output length, prompt template ID, agent session ID. None of these existed three years ago and many are high cardinality. All of them matter, both for debugging and for cost analysis.

Finally, AI systems introduce a new complication that makes high-fidelity event data non-negotiable: they are non-deterministic. The same input does not produce the same failure twice. When an LLM-powered feature misbehaves, you cannot reliably reproduce the bug in a test environment and work backward from a stack trace. You can only reconstruct what happened from the original event, with all its context, exactly as it occurred. If that data is stripped of attributes and pre-aggregated, the information is gone.

Introducing Honeycomb Metrics, now generally available

Today, we are announcing the general availability of Honeycomb Metrics: native time series storage for infrastructure metrics, fully integrated with Honeycomb's existing events platform, with both models queryable through a single interface.

In practice, this means that engineering teams can now point their existing OpenTelemetry metrics pipeline at Honeycomb with no re-instrumentation required. Low-cardinality, high-volume infrastructure metrics are stored efficiently as time series, while traces, logs, and high-cardinality telemetry can be instrumented as raw events. Both are queryable as time series in the same UI and correlated in the same timeline.

The result is an investigation experience that was not previously possible.

"We used to have to jump between three different tools just to investigate a single alert," said Tanner Johnson, Engineering Manager at Notion. "Having metrics and traces in a single platform helps me get to the right answer fast, and when I need to discuss with the team, I just share the Honeycomb link. In just a few months, we have grown to over 50 engineers querying metrics in Honeycomb."

A cost structure for the AI era

For teams currently running on a time series-heavy stack, the cardinality problem is a constant threat. The billing model that charges per custom metric means that adding meaningful context to your data, the kind of context that makes debugging possible, can dramatically increase your monthly bill.

Notion experienced this firsthand. Before moving metrics to Honeycomb, their engineers faced a constant tradeoff between data quality and cost. "Before Honeycomb, our engineers kept having to choose between getting the data they needed and keeping costs under control," Johnson said. "Moving metrics to Honeycomb means we can continue collecting time series data for standard metrics, now at over half a trillion datapoints each month, covering more than 100 million users. But we can also capture the custom dimensions we need, like host IDs and container metadata, without worrying about cardinality-based billing blowing up."

The underlying economics are straightforward. In a time series model, tracking more activity is cheap but increasing complexity or detail is expensive. In an events model, tracking more activity increases costs linearly, but adding complexity is free. Honeycomb customers get the best of both models by capturing simple but large-scale infrastructure data as time series metrics, and capturing complex activity and custom metrics in their applications as events. No need to ration context. A product manager building a dashboard, an SRE, or an agentic AI running a SUM query, doesn’t need to know if the data was instrumented as events or time series—the queries just work.

From customer escalation to root cause

Return to that Black Friday checkout spike. In a world where time series and events live in the same platform, the investigation looks different.

The on-call engineer checks region: fine. Payment provider: fine. Deployment: nothing recent. Then they break down latency by promo_code. The p99 for one code, DISC50BFRI, is running at forty times that of every other value. They click through to the traces behind that metric, see the specific database query the code is triggering, and have the information they need to page the right team with a specific diagnosis rather than a vague escalation.

Alternatively, an investigation may not even be necessary! AI agents responding to the original alert may have followed the same trail within seconds of the original alert and simply presented their analysis to the on-call engineer with the root cause already identified.

Both of those outcomes are made possible by retaining context without fear of spiraling costs and unpredictable overages.

Observability for the age of AI

Engineering teams in 2026 are operating systems that are more complex, more dynamic, and less deterministic than anything the observability tooling of the last decade was designed for. The number of unique identifiers flowing through a modern production system grows every year. The questions that matter are increasingly ones you did not think to ask in advance, and the systems doing the asking are increasingly autonomous agents with no human intuition to fall back on.

Honeycomb was built on the premise that you should be able to ask any question of your production data, including questions you did not anticipate. Honeycomb Metrics extends that premise to the full surface area of what engineering teams need to observe: infrastructure and application, time series and events, human engineers and AI agents, all working from the same data, in the same place.

To learn more about Honeycomb Metrics, reach out to your account representative or schedule a demo today.