Paul Osman [Lead Telemetry Engineer|Honeycomb]:
Hi, my name is Paul Osman, and today I’m going to talk about OpenTelemetry and Honeycomb.
At Honeycomb, we’re big fans of the OpenTelemetry project. My name is Paul Osman, I’m a lead engineer on the telemetry team. Today, I will talk a little bit about OpenTelemetry, why we think it’s really important, and covering a little bit of Honeycomb’s journey: Where we’ve been, where we are, and where we’re going.
To start off, what is OpenTelemetry? And the way I have heard it best described, OpenTelemetry is a collection of tools, APIs, SDKs, that allow you to generate and export telemetry data.
What’s important about this is OpenTelemetry is really two things: It is an open specification that actually describes how telemetry data should be represented, and it’s software that implements that specification. So it is both defining a standard and actually giving you the tools that you need to use that standard to instrument your code and to generate next port telemetry data.
OpenTelemetry has an interesting history. It’s one of these success stories in open source, I think. It came out of the recognition that there were two unintentionally competing standards in this space. In 2016, the OpenTracing project was founded. OpenTracing was an effort to describe a specification that would describe an API for instrumenting your code with tracing data.
Around the same time, OpenCensus started, which was a project I believe grew out of Google. A year later, the two groups realized they served the same sets of needs, largely, and so OpenTelemetry was formed as a way to combine the best of OpenTracing and OpenCensus.
OpenTelemetry hit alpha in 2019, beta in 2020, and on track to be generally available at this year, at which point, OpenTracing and OpenCensus will sunset. OpenTelemetry is the future of how we envision people instrumenting and adding telemetry to their systems.
The promise of OpenTelemetry is really this future of vendor-neutral instrumentation, and this is what I’m frankly really, really excited about.
I REALLY believe that instrumentation should be boring, and this is an essential part of making that happen. When I say “boring” I don’t mean put to you to sleep. What I mean is it shouldn’t be something you spend an awful lot of time thinking about.
Anyone who worked in microservices architecture or large monoliths knows that instrumenting your code isn’t actually in itself valuable. It’s what you get out of that instrumentation that creates value. It’s the observability that you add to your systems that creates value, so if you have to re-implement instrumentation every time you choose a new tool, that’s a bad choice for you. It doesn’t create value. It’s a huge time suck, and frankly, a lot of organizations just won’t be able to do it.
OTel replaces the need for vendor-specific SDKs. At the moment, it supports tracing, context propagation, so you can have distributed systems where services are calling each other, and you can connect those requests from the customer’s perspective. OpenTelemetry also supports metrics and is working on a log specification.
The components of OpenTelemetry — I mentioned already that there is a cross-language specification, so this is really the governing spec. This is how different language communities and different tool vendors should go about creating tools that can form to OpenTelemetry specifications.
The spec is really important in guaranteeing that people have a consistent experience with OpenTelemetry, and there’s interoperability between various OpenTelemetry components. There’s also a tool called the OpenTelemetry Collector, which I absolutely love and a lot of our customers have gotten good use out of.
The OpenTelemetry Collector is an agent or it can be run as a sidecar or as a central part of your infrastructure, but what it does, it’s basically a proxy that can do processing. It allows you to take trace data, maybe in one or more formats, add some processing to them, maybe you want to do scrubbing data from them or do something else to the spans in flight and send them to one or more back-end sinks. That could be a vendor, an impasse tool, etc. I think of the Collector as the Swiss Army knife for your telemetry data.
OpenTelemetry is a collection of per-language APIs and SDK libraries. This is really important. OpenTelemetry has a community of people from different language backgrounds working on making sure it works consistently with Java, C#, Ruby, .NET, Python, etc. It also includes auto-instrumentation libraries that you can just plug into your code and are the lowest-effort way to start getting useful data about your systems right away.
I’ll walk through a few examples of how that works, but basically, the hope is you can just drop something into your application and start using an observability tool right away.
So that’s what OpenTelemetry is. I want to talk a little bit about why I think this matters and why I think it’s really important that there is solid community building and open specification and tools for telemetry data. There are a few things I think really stand out to me, both when I’m working on this stuff at Honeycomb and when talking to customers who are either evaluating us or actively using us.
The first is that OpenTelemetry is a large and active community. You know, there are hundreds of people who are members of the actual OpenTelemetry GitHub organization, and that is actually just a subset of people who are committing to OpenTelemetry. Those people have gone through the process to be official members. There are far more people who have submitted pull requests, do code reviews, etc.
This community is gonna outpace any single vendor. So, you know, if you are working with language-specific SDK and there’s a bug in it, the chances are that somebody else in the community has encountered that, may be actively working on it, or you yourself can become a contributor. The power of this community means, as I said, they’re gonna outpace any single vendor and I think that’s a win for everybody in the space.
The other benefit to having an open standard govern how you instrument your code and generate telemetry is that it adds support for custom telemetry pipelines. This is something I have seen a lot. You know, when you’re starting out with observability and with instruments your code, you might just sprinkle in some instrumentation into your application or might add an auto instrumentation library and send it off to a single back end like Honeycomb, and that’s a great way to start.
But as organizations mature their telemetry pipelines and have more complicated needs, you may start seeing the need to, say, fork off your telemetry data, and so I have certainly seen scenarios where someone wants to send their trace data to a vendor like Honeycomb, but they may also have an in-house tool they’re also using to collect that trace data.
Maybe it’s because different teams use different tools and there’s a rollout process, and not everybody is ready to get onboard with a single vendor. Or it could be that you have different retention requirements like I’ve also seen people take, um, their event data, their trace data, and send it to a vendor like Honeycomb while also archiving it on S3, so if you have to do any kind of batch processing or ETL or anything like that, you’re free to do that.
I have seen people dual-stream their metrics and traces through Kafka, through something like Kafka, and that allows them to do stream processing and whatnot. So again, you know, as you mature as an organization, as an organization grows larger, you’re going to see more of these needs for custom telemetry pipelines, and having an open standard just means you can mix and match tools and use tools from various parts of the community, which I think is really powerful.
The other funny use case that crops up all the time — and I absolutely love this as both a person who works for Honeycomb but also somebody who comes from the SRE space and is used to evaluating solutions like this — is having an open standard for telemetry allows you to have choices. I mentioned instrumenting your code is not where the value is, it’s getting the observability into your systems. That’s where the value is. So if I’m in a position where I want to evaluate multiple vendors for storage solutions or whatever, I REALLY don’t want to instrument my code more than once. That, to me, would be just a non-starter if there’s an alternative.
What OpenTelemetry allows you to do is instrument your system once, set up your pipeline, and use multiple OTLP exporters to send data to two vendors at once. As a vendor, this is a GREAT thing. It allows us to actually compete on whether or not the tool is the right fit and not whether you had to go through the pain of instrumenting your code. And that’s where I think we ALL want to be. It’s worth mentioning that OpenTelemetry is built in such a way that you can do this with an OpenTelemetry collector, that component that can run as a sidecar to your application, or you can just do it in code by specifying multiple exporters.
The point here is that all of these challenges exist, and those are only a subset of challenges that I’ve heard from users of telemetry and people implementing observability solutions, but OpenTelemetry makes this boring, and that’s the place I think we all want to be. It just makes a lot of this stuff table stakes. I think further into the future and where this idea that instrumentation is boring can go — and, you know, we’re starting to see adoption in this in the OpenTelemetry community, where the authors of popular libraries, for instance, are instrumenting their code.
If you have my SQL or Redis library, the author can drop OpenTelemetry code into their project so if they use their libraries, it’s automatically emitting telemetry. This is really, really cool to me and really, really exciting. I’m really excited to see a future where maybe framework authors do the same thing, and I know that some are actively working on this where they dump observability data into their framework so you use the framework and automatically get this stuff out. These are all the things that OpenTelemetry makes happen, and I think there is a very near future where instrumentation becomes really boring as a result. We just don’t think about it, which I think is wonderful.
So that’s a lot about OpenTelemetry and why I think it’s exciting. Let’s talk a little bit about Honeycomb’s journey into the OpenTelemetry project. We started this about two years ago. At the time, OpenTelemetry, the specification had not yet hit 1.0, but we were already seeing a lot of interest, and we knew this was going to be an exciting space. Our first job was actually to kind of collect data, to give our users some — especially the early adopters — some tools to start using OpenTelemetry and really see how it went. So we released Honeycomb exporters for Go, Java, and Python. And these exporters, you can think of them as bolt-on plug-ins to the OpenTelemetry ecosystem so you can use an OpenTelemetry SDK and a Honeycomb-specific exporter to send that data to Honeycomb.
Around the same time, we released the Honeycomb exporter for the OpenTelemetry Collector. I’ve talked a bit about the Collector. That meant you can run the Collector with a Honeycomb-specific exporter and just translate your OpenTelemetry data into Honeycomb data. This was a great way for us to test the waters, and I’m really happy to say we saw a lot of positive feedback from our users. This is what instrumenting your code within the Honeycomb exporter would look like. So you set up your exporter with, you know, Honeycomb credentials and then pass that in when you’re creating an OpenTelemetry trace provider.
As I mentioned, we saw a lot of positive input from our customers. We saw some adoption in phase 1 of our adoption of OpenTelemetry, and so we realized we really didn’t want a future where we had to maintain all these extra plug-ins or exporters. It wasn’t good for us or customers because they had to keep track of more language-specific libraries.
We worked on adding ingest support for OTLP, which is the OpenTelemetry protocol. At this point, we released support for ingesting OTLP over gPRC as of December 2020. As of that date, customers could use any OpenTelemetry project that supports gRPC and without specific Honeycomb exporters and were able to start sending us telemetry data. We added support for ingesting trace data to our events API, as well as our two proxy products, Refinery — a sampling proxy — and our Secure Tenancy proxy. However you were using Honeycomb, you could use OTLP over gRPC, without any Honeycomb-specific adapters, and start sending data to us.
This actually saw really, really strong adoption. You know, from a code perspective, this didn’t really simplify the boilerplate because you still had to set up an exporter, but now it’s OTLP exporter with no Honeycomb-specific libraries in here. This is all stuff packaged with OpenTelemetry SDKs. So we started seeing a lot of adoption right away. A lot of people were really excited about using OTLP.
Now we have started moving on to phase 3, and phase 3 I’m really excited about. This is what we’re actively working on now, which is we recognized our customers are excited about OpenTelemetry, as excited as us. We’ve seen adoption of using OTLP, and we’ve seen strong support across different language, you know, specific libraries and whatnot.
Now what we’re focusing on is making that journey easier for our customers. We already know you want to use OpenTelemetry. We already know you want to use Honeycomb. Let’s make it easier to use those things together. So we’re working on Honeycomb Distributions for OpenTelemetry, and these distributions, we’re working on Java right now. We have .NET up next.
We’re going to roll these out for the languages we support and our goals are three-fold. It’s to simplify configuration for Honeycomb — and what I mean here is you shouldn’t have to worry about how your credentials are sent, what header names you need to specify. You should be able to set an API key, set a data set, and be on your way. We also want to enable certain Honeycomb features, where these things are all technically possible with OpenTelemetry, like adding trace field to your spans, but we want to make it a little bit easier and just get out of your way.
From then on, it is just OpenTelemetry. So we definitely don’t want to violate vendor neutrality. If ever you use a Honeycomb OpenTelemetry distribution and later down the road for whatever reason decide you don’t want to, all you’ll have to replace is the configuration codes, the code that actually gets you up and running with the vanilla code from the OpenTelemetry SDK, and you’re off to the races. It’s very much focused on at that point getting out of your way and giving you access to the great OpenTelemetry API.
Here’s samples of what it looks like now. I mentioned we were working on Java. We just released that and uploaded it to Maven Central, so if you’re a Java and OpenTelemetry user, you can start using this today. Notice how we have tried to structure this experience is around the idea that you shouldn’t need to know the internals of how certain information is being transmitted to the back end. When you instantiate a builder, for instance, we don’t want you to worry about the fact the service name is a resource attribute. What we really want you to know is you can set a service name on your trace data, and any trace that’s generated from this tracer will have that service name attached to it when you look at the Honeycomb back end.
Similarly, when you’re adding credentials, just set API key, set data set.
The other thing we have added support for — and I mentioned certain Honeycomb-specific features — is samplers that are compatible with Honeycomb. Sampling is an area where OpenTelemetry has defined an interface, but OpenTelemetry is not in a position to really offer a lot of sampling solutions because those will be vendor-specific.
On Honeycomb, we’ve added a deterministic head sampler, and when you pass into the constructor, for instance, 5, what you’re saying is, “I want one out of every five of my traces to arrive at Honeycomb.” We will set the sample rate on those so when we’re showing you data in the Honeycomb UI, we will amplify, you know, traces, knowing that a specific trace represents five traces.
The Java OpenTelemetry distribution also includes an auto instrumentation agent. Again, we wanted to make this simpler. When you know you’re using Honeycomb, you just have to specify your Honeycomb API key, data set, and then optionally, you can set a sample rate and set a service name. These are all just sent as system properties when invoking the JBM with your Java application. So running this command if you have, you know, a JAR file that bundles your Java application, will start sending trace data to Honeycomb that’s auto instrumented from your code. This is the sort of straightforward way to start using Honeycomb with Java, is to run with it a Java agent of our distribution.
I had that sample where I just ran my application with a Java agent and with Honeycomb credentials, and this is just a quick sample out of a spring boot application I threw together. You will see that automatically, I’m getting tracing of customer requests and so I get all of the fields that are generated automatically by the agent, and then I get spans representing things like the requests, the actual request mapping to a method in a controller and even some stuff on the network layer.
And so what we want to do also — I mentioned auto instrumentation and using our SDK to add manual instrumentation to your spans. This is something we encourage people to do. Auto instrumentation, we always say, gets you so far. To truly understand your code, you have to do some manual instrumentation at some point, and we want to make this as easy as possible.
Using, for instance, the Java Honeycomb OpenTelemetry distribution, all you need to do is set attribute on the current span if you want to add data to a span that’s in flight. It’s so boring you might not see it: It’s these two lines right here. I’m just adding a couple of attributes to the current span to a regular, you know, spring boot controller method. And so here I’m adding just a simple, you know, hard-coded attribute, and then I’m adding an attribute with the number of records returned from a query. Here they are. If I run that code and I run that request, I will see my fields show up in the Honeycomb UI.
Just to recap, we think this is really exciting. I’m really excited about seeing the OpenTelemetry project mature, evolve, and I’m really excited to see our customers start to adopt this. I’d really encourage everybody to visit https://docs.honeycomb.io — you’ll start to see OpenTelemetry featured a lot more prominently on there. And also visit the OpenTelemetry project at https://opentelemetry.io. There’s tons of great documentation there, also information on the getting involved in the community.
Special calls to action is if you’re a library author, consider instrumenting your library with OpenTelemetry and the same thing goes out to framework authors. I would LOVE to see more frameworks adopt OpenTelemetry so when users use your application framework they get telemetry data for free. And we achieve that dream of instrumentation being boring. Happy to take any questions at this point. Thank you so much for listening, and I hope you enjoy the rest of the conference.
Yeesheen Yang [Senior Product Manager | Honeycomb]:
Wow, Paul, I think what we’ve learned is instrumentation is NOT boring. Loved the talk.
Thanks so much. Yeah, I really loved talking about this. It’s a topic I think a lot about and I think a lot of our customers care about.
Absolutely. I wanted to ask you: I loved what you said about OpenTelemetry is a large community that outpaces any single vendor and it seems like it is or can be the leading edge of instrumentation and telemetry. I’m curious, you know, what possibilities do you see that opening up? How does that change the landing escape for us?
Yeah, that’s a great question. Yeah. I do think that OpenTelemetry will outpace any vendor. And the reason is really the same as the reason for any open-source project when you have a large community of people who cross organizational boundaries working towards a similar goal, you’re gonna see these kinds of network effects, and I think those will have cool implications for people working on systems.
There are some obvious examples, you know, like I mentioned during the talk, library authors starting to instrument their code with OpenTelemetry. That’s fantastic. You know, it means if I’m using an HTTP client in my code, every time I send an HTTP request, I can have a span automatically emitted by that HTTP library and I don’t have to think about it. There’s the boring part.
Now, of course, we all know that auto instrumentation gets you so far, but having OpenTelemetry as a standard API means I should also be able grab that span — I think of it like grabbing from out of the air and saying, “I want to add an attribute that’s specific to my code,” or “I want to start a child span from this point of my code on.” All of that should be possible.
An HTTP library is the most obvious example but I think there are some non-obvious things I’d love to see happen. I’m thinking of things like pieces of your architecture or infrastructure start to actually adopt OpenTelemetry. So, you know, databases, for instance, like if you’re making a database query, being able to grab that span out of the air and have information from the internal of your database engine in that span as attributes would be hugely helpful to people who are working on debugging.
Oftentimes we do things like try to wrap around the database call, or we instrument like the logs of the database to send telemetry to a backend system, but what if that was done by OpenTelemetry? I think about things like load balancers, you know, like being able to truly trace the request from end to end. And that all becomes possible, I think, because OpenTelemetry is the de facto standard for having to do that stuff. That’s like software systems but also language communities.
You know, looking at Honeycomb and looking at a lot of our competitors or people in the open-source landscape, you’re always going to see libraries implemented in Java, you know, probably .NET, Go, Ruby, Python, but there’s a lot of languages that are often lacking and if you’re in a niche category or maybe a language isn’t as popularly adopted, like Scala or something like even Rust or, you know, older language that are still popularly used, like PHP. OpenTelemetry allows — is gonna have more support for this because there are more people interested in all these different language sub-communities.
I think speeding up that adoption and making it so ubiquitous. I can even imagine cloud vendors implementing this stuff so you just have systems that are just sending telemetry data and then your job, actually, becomes deciding what telemetry data you might want to suppress. Maybe you don’t want all this data but it’s really nice to have it and be able to make use if you need it.
There’s this idea of ease and plug-and-play and customization — it’s really magical. Cool! You mentioned custom telemetry pipelines and how the open standard sort of enables sending metrics to multiple vendors, fitting to different teams’ needs, especially in larger enterprises. I’m curious to hear how you think that evolves and grows over time from a small, maybe like a small shop to a bigger one.
Yeah, absolutely. I think about the archetypal small shop. It’s you and I, we’re building an app, using hosting providers for everything, buying instead of building like we should. That means you and I just want to instrument our code. So we’re gonna have that. We’re gonna choose a vendor, probably. We’re going to send telemetry data to the vendor and call it a day.
Over time, our organization grows, and let’s say we get like a machine learning or like, you know, a data science team that wants to use machine learning to add features to our application. Well, we’ve got events so that’s fantastic in the form of trace spans, and so that’s not a use case Honeycomb is really gonna solve, right? We’re not going to be implementing machine learning models that are applicable to your business to, to our business. Maybe instead you pipe that data off and send those events to some team ingesting those and using them to make product decisions based on user behavior. That’s one hypothetical.
Another is analytics, like you want to know every time something happens in your system. Well, you’ve already instrumented your code and already collecting data, so implement something like an OpenTelemetry Collector with a filter— just say send this to some analytics backend. This is where I see a lot of organizations will hit a size where may probably use a centralized broker like Kafka or something like that to basically tee off the data streams into different teams and for different purposes. One of them will be the primary reason we started with, which was, you know, debugging production systems to maintain availability and reliability.
But other things come to mind. The other thing I could definitely see is, you know, different teams having slightly different needs as an organization grows, right? Not one product is going to be usable by everybody for every purpose. Being able to actually have one product in use in one part of the organization and another product in use in another but having the same data, essentially, I think, is really powerful.
I think as an organization evolves, you just see more and more use like that, and, you know, having a unified protocol for both the data format and for the API, I think, just opens up a world of opportunities. And I’d actually like to see vendors that are in adjacent spaces, like some of these business intelligence or analytics companies, start to adopt OpenTelemetry as a way of mining information about what is happening in your systems.
That’s so, so rad. Cool! Paul, thank you so much!
No problem! Thank you!