Observability  

Escaping the Cost/Visibility Tradeoff in Observability Platforms

By Fahim Zaman  |   Last modified on January 5, 2024

For developers, understanding the performance of shipped code is crucial. Through the last decade, a tablestake function in software monitoring and observability solutions has been to save and track app metrics. Engineers love tools that get out of your way and just work, and the appeal of today’s best-in-class application performance monitoring (APM) suites lies in a seamless day zero experience with drop-in agent installs, button click integrations, and immediate metrics collection. However, the success of no-hassle metrics comes with a caveat—the internet is replete with examples of premiere application monitoring costs spiraling beyond expectations:

APM Costs - Shocked Customers

Is there a pattern behind these worries? It turns out, yes. There is a level of modern system complexity where relying on metric collection spirals into cost inefficiency. Read on for an explanation of the challenge with the traditional APM cost model for modern software teams, and how to begin solving it.

The problem: escalating costs with system complexity

With metrics-driven APMs, costs escalate as systems grow in complexity through modern practices and service compartmentalization. Each new host, pod, node, or service adds to the bill. This is due to most solutions storing and indexing detailed custom metrics for fast analysis, as well as storing data in more forms like traces and logs, resulting in a cost/visibility tradeoff. Companies find themselves either over-investing in observability or sacrificing visibility to control costs.

Limiting observability due to cost constraints hinders understanding user interactions and system performance. This leads to compromised user experiences, as issues in newly shipped code may go undetected. Either your developers slow down on shipping features your business needs, or you understand and solve less about production code, frustrating users with a backlog of unresolved bugs and incidents. Ultimately, your end users have a worse experience.

Solving for the cost/visibility tradeoff with modern observability

Modern observability solutions like Honeycomb don't require storing the same data multiple times in different formats. The solution takes advantage of wide, attribute-rich events and a parallelized query engine that can provide real-time data retrieval, metrics graphs, alerts, and relationship analysis without the need to pre-store custom metrics or rely on patching together separate data types.

With efficient and attribute-rich event handling, Honeycomb focuses on making observability easy to adopt and scale. Our simple pricing model ensures that any system complexity is addressed by unlimited event attributes, avoiding additional costs that come with other vendors. In a quick comparison to a market-leading APM suite such as Datadog, you’ll see Honeycomb’s focus on pricing by event rather than system complexity (pods, nodes, services, hosts, etc.) keeps costs down:

Datadog Honeycomb
Services $ per tracked service Unlimited
Hosts $ per tracked host Unlimited
User/Audit/API call IDs $ per tracked ID Unlimited 
Containers $ per tracked container Unlimited
Pods $ per tracked pod Unlimited
Nodes $ per tracked node Unlimited
Ingested Records Separate bill for ingested vs. indexed records Price per event, subject to volume discount
Overage Control High watermark overage billing Burst protection and throttling until contract adjustment
Volume Controls Sampling creates gaps in trace visibility; custom metric billing limits attributes from analysis Rule-based trace aware sampling without dropping any system attribute, keeping all events with incident relevant info

The result: a truly cost-effective and flexible platform that takes your observability program beyond the constraints of traditional APM platforms. This emphasizes attribute-rich data with no limitation on complexity, using a custom datastore and parallelized query engine for efficient processing. It aligns observability costs with actual application events, ensuring scalability without crushing your budget.

Implementing cost efficient observability: practical examples

Step 1: Finding target areas for efficiency gains with modern observability

Getting true control of your observability spend begins with evaluating your system's fit for a new approach and where a change will have the most impact. Key indicators that call for a modern approach to observability include:

  • Multiple intercommunicating system components combining on a request.
  • Components with heavy adoption of cloud-native technologies and containerization.
  • High consumption of custom app metrics for production software insights.

For example, a new data platform customer of ours recently reviewed these factors and found multiple boxes checked with a substantial APM custom metric bill. Within the last 60 days of talking to us, they’ve found the right starting points for their journey towards modern observability, already achieving reductions in their spend.

Step 2: Embedding OpenTelemetry and cultivating developer buy-in

Next is engaging your developer teams in adopting a new, more efficient instrumentation protocol. This involves transitioning from traditional APM metric-storing to more scalable solutions like OpenTelemetry with Honeycomb. This takes a structured approach. 

The philosophy of throwing a proprietary agent at your services, saving what you think you’ll need, adding a bucket to grab logs, and piecing everything together in post during a production incident is out of date, time-consuming, and cost-inefficient.

What developer buy-in looks like:

  • Software teams adopt an open and sustainable instrumentation package like OpenTelemetry, and build the muscle memory of adding custom attributes in-line with the code they’re shipping whether in staging or prod.
  • Instead of planning to tag and store custom metrics as incidents occur, developers collect code and infra attributes as free context on trace events.

Custom instrumentation is muscle memory: Transition from tagging new custom metrics or printing additional log records for your APM suite, to embedding custom attribute tracking directly into OpenTelemetry. This practice becomes standard for both production and staging code.

What used to involve this custom tagging and metrics creation in Datadog like this:

Custom tagging and metrics creation in Datadog

…now becomes a code-level practice like this:

  //Get current Span
  Span span = Span.current();
  //Add custom attributes to Span
  span.setAttribute("OrgID", orgID);

For example, a premier online commerce platform that moved from a centralized APM agents to OpenTelemetry in 2023, is recognizing the need for more adoption training and engaging Honeycomb as a partner. Over 40 development teams that have switched to OpenTelemetry are training with observability experts to learn how to use attribute querying to get system insights they never could before.

Step 3: revamping production investigation to decrease reliance on pre-stored metrics

This step phase marks a pivotal shift in how developer and ops teams view observability data. Instead of pulling up saved metrics and the dashboards associated with them, developer teams operating production code can view any attributes on their trace events as a metric count, heatmap, group by, or much more complex analysis processed at query time.

The focus changes from monitoring numerous preconfigured dashboards to prioritizing a few high-value business SLAs tied to user satisfaction. If something that matters goes wrong, the data you need will be a click away, constructed at query time. With a modern observability practice, you focus on the key results, where they’re failing, and instantly see real-time anomaly detection and analysis within context-rich events.

Honeycomb customer quotes

This shift makes developer teams faster and frees them from worrying about more overhead while shipping new code. Notable examples include CCP Games, Intercom, HelloFresh, Vanguard, and Slack, where hundreds of engineers have experienced a positive transformation in both the richness of their instrumentation and their problem-solving efficiency.

End result: leveraging financial gains for a better user experience

Finally, the savings from this shift in your APM bill can be reinvested into a virtuous cycle of improving customer experiences. Rather than allocating a significant portion of any incremental cloud budget to observability, your resources can be redirected towards product enhancements. As a result, observability expenditures align more closely with the value your application produces.

Honeycomb is the clear winner when it comes to costs.

A striking real-world example that comes to mind is a leading compliance technology provider that reduced observability costs from 5% of total revenue to less than 1% by adopting modern observability practices with Honeycomb and attribute-rich events. 

By cutting out billions of custom metric stores tied to high-cardinality infra and user context, switching them to OpenTelemetry attributes, and relying on Honeycomb to surface this data at the right time and place, the customer is cutting seven figures off their observability bill, removing pressure and freeing budget constraints to put toward a better product experience. 

Conclusion

The transition from traditional metrics-collecting APMs to modern solutions like Honeycomb represents a strategic shift towards more scalable, cost-effective observability. This shift ensures that growing investments in observability tie directly to business growth and innovation, rather than just managing escalating costs of complexity.

How can you proactively and practically overcome the cost-visibility tradeoff today? Getting started with Honeycomb and OpenTelemetry is free, of course—but you can also get an assessment of your APM bill to see if it would be worth it for your organization. Reach out through our Slack community or our website and begin your journey towards efficient and scalable observability today!

 

Related Posts

Observability   Customer Stories  

Flight to Success: Birdie's DevOps Evolution Fueled by Observability Insights

Birdie’s platform is a complex software system that covers a lot of ground—from care management and rostering to HR and finance. To ensure the platform...

Software Engineering   Observability  

Where Does Honeycomb Fit in the Software Development Lifecycle?

The software development lifecycle (SDLC) is always drawn as a circle. In many places I’ve worked, there’s no discernable connection between “5. Operate” and “1....

Observability  

The Cost Crisis in Observability Tooling

The cost of services is on everybody’s mind right now, with interest rates rising, economic growth slowing, and organizational budgets increasingly feeling the pinch. But...