Ask Miss O11y: Baggage in OTelBy Martin Thwaites | Last modified on April 25, 2022
Miss O11y is delighted to welcome our newest band member: Martin Thwaites! Martin has been a member of the Honeycomb user community practically since its inception. He is a UK-based consultant who specializes in helping teams scale up and tackle challenging business problems, and a long-time contributor to the Azure and .NET communities. We think he looks ✨amazing✨ in a tiara.
Dear Miss O11y,
What on earth is “Baggage” in OpenTelemetry? Why does it exist and what would I use it for? Please de-mystify it for me
Thanks for the question, and it’s a common thing to ask. Honestly, OpenTelemetry (OTel) Baggage is the footgun you never wanted—but we’ll get to that in a bit.
So what is this OTel Baggage thing?
Imagine you wanted to have the CustomerId appear on all your spans, but it’s only available on the initial API request because your Stock Check API doesn’t need a Customer context. This is where OpenTelemetry Baggage comes to the rescue.
In OpenTelemetry, "Baggage" is a fancy term for contextual information that’s passed between spans. In Honeycomb distros for OpenTelemetry, we take this a step further and allow you to add the Baggage data to all the spans as attributes (more on this footgun later). More specifically, it’s about passing that context between service boundaries. So really, it’s about pushing that context over an HTTP, gRPC, or a message so the other service can use it to add context to its span.
OpenTelemetry uses a concept called “Propagation” to pass this concept around, and each of the different library implementations has “propagators” that will parse and make that Baggage available without you needing to explicitly implement it.
But why the hell does OTel Baggage exist?
That’s a really good question. We have HTTP and message headers right? They’re a key value list right? Why re-invent something that already exists? Is it just about “Not-invented-here”?
All valid questions, but there is something special about Baggage that makes it different. Let’s talk about standardization! The brilliance of OpenTelemetry is that it’s a cross platform and cross framework. What Baggage gives you is a requirement that the context values live in the same place, have the same format and follow the same pattern. That means that all your applications, no matter what the language, will be able to read them, parse them, and use them. This is important when you’re building a massively distributed system, and you want to provide autonomy to teams to work in whatever language or framework they want.
You could absolutely use something else for this; e.g., you could standardize on headers, etc., in your organization. However, what you’ll soon find is that you end up building helpers in every framework and language that are never maintained.
What should I use OTel Baggage for?
This is where the footgun comes in—the best answer is “nothing sensitive, and nothing that you don’t want third parties to see.” Additionally, don’t always trust what you get because there are no built-in integrity checks to ensure it was your Baggage items.
Common use cases we’ve seen are information that’s only accessible further up a stack; things like Account Identification, User Ids, Product Ids, maybe even origin IPs. Passing these down your stack allows you to then add them to your spans in descendent services to make it easier to filter when you’re searching in the UI.
As we can see in the diagram, unless the AccountId is passed via Baggage, the Stock API cannot add the AccountId to the spans. This gets really important when we're debugging live, high-volume systems, as we may want to know whether a load on our Stock API is being caused by a particular Account or even a particular IP address.
So tell me about this footgun thing?
Baggage can be prolific … it goes EVERYWHERE. Because it’s in the background and OTel is passing it around without you doing anything, you don’t know it’s happening.
Imagine all your secrets being shared with your neighbors, but imagine you were the one doing it in your sleep or simply having them flash up on your T-shirt while you’re talking to them. That would be … not good, right?
That’s what can happen if you’re not careful with how you use Baggage Propagation and what you use Baggage for.
I’ve also seen these kinds of shared context be abused. If you can imagine baggage being similar to “Session” data that’s stored for a user, you can start to get an idea of where I’m going with this. If you’ve ever worked in .NET or Java, you’ll have seen people pushing entire object trees into session state because they might need some of the properties. Just imagine that, on top of storing that object tree, you’re also passing it between all of your services.
If you’ve seen someone add an extension method to the Baggage functions in your language that allows serialization of an object into a string that can be used in Baggage, please just put them out of their misery and save yourself a 5 a.m. alert because the system is running slow.
Baggage != Span attributes
One final thing on Baggage that is the biggest misconception we’ve found is that Baggage is not a subset of the Span attributes that are added when you push them.
It’s not that unreasonable to assume that when you add something as Baggage, you’re doing it so it ends up on the attributes of the child system’s spans. However, it doesn’t; at least not automatically. You must explicitly take something out of Baggage and append it as attributes.
var accountId = Baggage.GetBaggage("AccountId"); Activity.Current?.SetTag("AccountId", accountId);
To make this easier, we’ve added the BaggageSpanProcessor to our .NET and Java Honeycomb libraries for OpenTelemetry that do this automatically. I would refer to the footgun as to whether you want to use these or build your own.
Let me tell you a (true) (funny) story…
As you’re likely aware, we dogfood Honeycomb and OpenTelemetry here. So everything we use in our telemetry ingest uses OpenTelemetry to instrument itself.
Now, imagine you’re a Honeycomb customer and using OpenTelemetry, and you start to add some context on your applications to Baggage. Now, because you’re using OpenTelemetry, that Baggage that you were using internally gets pushed onto your telemetry provider, aka us…
The above scenario did happen, and when it did, there wasn’t the concept in the Go libraries we were using at the edge to say “I’m an external endpoint, ignore everything about tracing context.” We then ended up with the customer’s Baggage information in our spans. Most importantly though, as things like
team_idare pretty standard names, they were overriding our own names.
What makes this worse is that the
parent_idproperties suffered the same fate, but with worse implications. That’s a story for another day though!
Before Massdriver, Dave worked in product engineering where he was constantly bogged down with DevOps toil. He spent his time doing everything except what he...
Honeycomb is massively powerful at delivering detailed answers from the last several weeks of system telemetry within seconds. It keeps you in the flow state...
One of the issues with the W3C trace context is that it doesn’t define any standards for how far a trace is to propagate. If...