At Honeycomb, we are frequently asked how we compare to what else is out there. Do these other tools offer observability? Do I need them all? What’s important? Metrics? Logs? What’s the best way to monitor application performance?
We’ve found that Honeycomb is complementary to other tools in some cases and in others, can help you reduce your usage or replace them entirely. Here’s the run-down, followed by some deeper discussion:
(Datadog, SignalFX, Prometheus, etc.)
- Application Context For Solving Problems – Honeycomb fixes the “Dimensions of Doom” problem (also known as high cardinality), which is described in detail in the discussion of metrics below. There is no explosive cost increase or limitation due to contextual information (usually called “tags” or “labels” in metrics products).
- Freedom to Explore – Using Honeycomb, you can ask questions and solve problems using data without needing to know exactly what to measure ahead of time.
- Raw Data Access – Honeycomb offers access to the raw data behind every visualization in the app – unlike metrics, which aggregate these details away.
(Splunk, Loggly, Elasticsearch/Kibana, etc.)
- Efficient Storage – Because Honeycomb deals with structured data instead of unstructured text logs, we can store and compress the data more efficiently than text-based log aggregators.
- Wicked Fast Querying – No indexes need to be defined to ask questions and get high resolution results from your data – Honeycomb’s storage engine makes everything ultra fast by default. Logging storage is notoriously slow to query, whereas Honeycomb can perform sophisticated queries on many millions of datapoints in under a second.
- Distributed Tracing – With Honeycomb Tracing, you can see the lifecycle of a request as it flows through your entire system, and visualize both high and low level details that logs alone can’t provide, right in the same tool–no context switching required.
Application Performance Monitoring (APM)
(New Relic, Dynatrace, AppDynamics, etc.)
- An Improved Day 2+ Experience – Many APM tools offer a quick on-boarding experience that includes some canned dashboards. This can be useful for getting started, but doesn’t offer the rich depth of exploration that Honeycomb brings to the table for the long haul. With Honeycomb Beelines, you get automatically-generated “starter” graphs alongside access to all the detail you need to solve the problems you don’t know you have yet.
- More Details Than Just Latency – Some APM tools only keep the slowest transactions. In Honeycomb, you can choose how you want to configure data retention and tune it according to your unique needs, whether latency is the issue or not.
- Insight Into Every User – Know when and where exact users or groups might be having a bad experience–using our unique support for high cardinality–and be proactive in identifying and fixing issues.
More than just metrics
In standard metrics tools, you are typically working with counters that track numbers as they change over time. Some common things to measure are the number of HTTP requests your app is serving, average latency, and rate of errors. Each one of these counters or gauges will generate one unique time series on disk.
That’s good for a basic start at detecting when problems are probably happening, but we need much more context to find out what is going wrong. So we try to add more context using tags or labels, which generate new time series. For instance, a metric measuring
number_http_requests might get tagged with something like HTTP status code, creating a new time series on disk for each unique value of status code – a new time series is created for
number_http_requests:503, and so on.
This multiplicative effect gets greater and greater as we add more tags and labels. If we want to know which host a metric came from, or which container, or which user was associated with a given request, and so on, the cost of storing all that data goes up exponentially. Likewise, if we want to track percentiles instead of averages, this will create multiples of the metric(s) in question.
We call this the “Dimensions of Doom” problem. The number of time series quickly becomes overwhelming, and impossible to store for tools that aren’t designed to handle it, much less read it back quickly enough to help you figure out where issues lie.
To solve this problem your tool must be able to handle high cardinality data – data with a lot of distinct possible values. Honeycomb’s storage model solves this completely by shifting to an event-based model, where raw values are stored and queried instead of aggregating everything up front.
Metrics tools do something else that’s problematic for fast, interactive DevOps problem solving: they aggregate away raw data to save space over time. For some use cases this makes sense, since it’s rarely necessary to, for example, know exactly which measurement of CPU usage or RAM a server had in a given time window. But this loss of resolution can make it very difficult to ask more questions of the data and get usable answers when you’re trying to debug a production problem quickly.
With Honeycomb, you always have:
- No Limit on Dimensions – We encourage the exact behavior that metrics systems forbid. Add fields to your events like user or team ID! Tag them with specifics about client versions! Add timers and fields that measure the behavior of 3rd parties! Honeycomb will happily handle it all.
- Freedom to Explore – You don’t need to know exactly what to measure ahead of time, just that it might be of interest to solve your issues. Since the cost of adding new dimensions and details to your data is so low relative to metrics, you can augment with all kinds of detail and dive into it later.
- Access to the Raw Data – Sometimes you want to filter down to only a few events, and go look at their exact contents. Honeycomb always allows access to the exact and complete details of the events you have retained.
Metrics excel at simple numbers to count such as number of jobs queued, host level resource usage, number of requests served by the system, and so on. This aggregation also makes bringing up data across long time intervals fast. But for resolving and preventing issues in production, you absolutely must have the rich details and context that Honeycomb can provide.
Beyond the spam of logs
Everyone’s first “word” with programming is usually logs –
Hello, World! emitted to the console. It’s no surprise that they are such a popular way to gain insight into what is happening in your systems. Unfortunately, incredibly often they end up full of massive amounts of noise which are little help for problem solving in production, like this:
On top of any logs generated by your apps, servers themselves constantly spit out plain text logs (mostly uninteresting) about what’s happening. Finding what you’re looking for in these logs requires knowing exactly what you are digging for–ahead of time. By contrast, with Honeycomb’s high level views you can zoom up and down through layers of detail to identify potential problem areas and ask more questions.
In our years of operational experience, we’ve found that it doesn’t take long for sending unstructured text logs to a centralized location to become very cost ineffective and slow. To get practical results out of logging frequently requires very high volume, with applications often hiding the most useful data behind one or more “debug mode” settings that would produce far too many messages if enabled full time in production.
As a result, logging best practices involve both setting log levels (DEBUG, WARN, INFO, etc.) and adding structure to your logs to pull out key fields. Sometimes this involves using programs such as Logstash to parse this structure out using complicated and fragile regular expressions. By the time you’ve implemented this structure, you’re halfway to structured events anyway – but you’re still stuck with all the baggage of legacy logging systems, and no futuristic troubleshooting features like distributed tracing and automatic surfacing of outliers.
The signal to noise ratio with Honeycomb tends to be much better than trying to blast every unstructured log you have into a centralized store:
- Efficient Storage and Querying – Since data in Honeycomb has structure by default, we store and query it much more efficiently than logging systems. In most logging systems, you have to declare indexes on fields of interest, and you are inherently limited by needing to know what you want ahead of time. Honeycomb doesn’t require you to declare any schemas or indexes up front – all queries run blazing fast by default.
- Futuristic Troubleshooting – Unlike logging systems, we offer the ability to use Honeycomb Tracing to see the lifecycle of a request as it flows through your entire system. Using Honeycomb BubbleUp, you can rapidly identify areas that are likely related to an issue you are seeing.
- Team Features – Everyone on your team can be elevated to the level of your best debugging using features like History, Query Builder, and Boards.
Logs are very useful for development and for some use cases such as long term auditing – but for rapid exploration and problem resolution, Honeycomb has a lot more to offer.
Further than APM
APM tools offer a very specific view into the world, but modern problems are widely varied and general. We find that APM tools usually offer a rapid getting started experience and canned dashboards that are great on Day 1. But isn’t most of your time spent on Day 2 and beyond? APM tools also tend to be obsessed with latency as a guiding signal – but while latency matters, your apps and users are far more complex and sophisticated than something that can be described (and debugged) in terms of just latency.
At Honeycomb, we put effort into making getting started faster, but we’re here to support you for the long haul too. Using our Beelines for automatic instrumentation lets you hit the ground running quickly and even gives you distributed tracing out of the box – yet also puts the full power of the Honeycomb query model in your hands, allowing you to ask sophisticated questions of your data across many dimensions. Not only can you ask these questions in the first place, but you can alert on them too. Using Triggers, the wildest edge cases can fire off notifications, expressed as a query across your structured events.
Consider, as well, the question of sampling – APM tools sample your data, keeping only the slowest transactions. That’s great for keeping costs under control, but don’t you care about many indicators other than just latency? What if you want to know when an important customer is seeing errors, or getting rate limited, or simply hasn’t shown up for a while? In Honeycomb, you have full control over the sample rates applied and can take advantage of our expertise at Smart Sampling – keeping more of the data you care about, while keeping less of the unimportant stuff.