Data Sovereignty and OpenTelemetryBy Austin Parker | Last modified on February 1, 2024
In today’s economic and regulatory environment, data sovereignty is increasingly top of mind for observability teams. The rules and regulations surrounding telemetry data can often be challenging to interpret, leaving many teams in the dark about what kind of data they can capture, how long it can be stored, and where it has to reside.
In the past, addressing these issues at scale was a costly endeavor. Proprietary tooling could aid you by allowing routing of telemetry to different places based on its geographic origin, but at the cost of lock-in. Trying to build pipelines to manage this yourself incurred a significant operational overhead, not to mention tech debt.
Today, OpenTelemetry offers you a variety of options to manage where your telemetry data should be routed, as well as ways to filter and redact it for compliance and regulatory purposes. Let’s talk about your options!
Routing telemetry data using OpenTelemetry
There’s a lot of reasons you might want to route telemetry signals to different sinks. To illustrate these, let’s imagine a hypothetical eCommerce web application with customers in the EU, the US, and LATAM. Certain services are shared between all customers, but many are not—for example, stock keeping, order processing, payment services, and so forth.
While these services may share code, they are probably owned by different teams. Some of their service level requirements may differ in order to comply with local regulations. Best practices for observability teach that you should ensure consistency in your telemetry data, but due to inconsistent regulations around auditability, right to be forgotten, or other privacy measures, some telemetry attributes may be inappropriate to collect for customers in certain regions.
Your first step, then, is to ensure that telemetry data is routed to the appropriate location in order to apply appropriate rules. The OpenTelemetry Collector’s routing processor allows you to do exactly that.
The routing processor works on any incoming signal type and can make routing decisions based on many factors. For example, you could look at the region of a client request (US, EU, or LATAM in our example) and route telemetry that matched that region to either another pipeline in the same Collector, or to a different Collector entirely. The routing processor can also inspect the resource attributes of telemetry using OpenTelemetry Transformation Language (OTTL) and can perform complex in-place transformations while routing. For example, a multi-tenant payment processing service could add the appropriate tenant ID as a resource to all of its telemetry. The routing processor could look for that tenant ID and route the signals somewhere else, while also removing the tenant ID (if no longer needed in further processing).
The routing processor can also be used to split signals based on desired storage destination. For example, if you’re practicing local data sovereignty and need to ensure that all EU-originated telemetry data winds up on Honeycomb EU rather than US, the routing processor can accomplish this for you. By offloading this work to an OpenTelemetry Collector, it reduces the maintenance burden of configuring these complex rules in your service code, and ensures that your service spends most of its time doing work rather than processing complex rules about where to send what data.
Filtering telemetry data using OpenTelemetry
Personally identifiable information, or PII, is any kind of information that can be linked to an individual. This includes obvious identifiers, such as a telephone number, government ID number, or names—but also hardware or device identifiers, behavioral data, or IP addresses. Based on prevailing laws and regulations (which you should consult with a lawyer about), handling this kind of telemetry may be subject to restrictions or audit requirements. Often, the best way to ensure you’re in compliance is to simply remove any and all PII from your telemetry.
With OpenTelemetry, there are a few options to handle PII. In keeping with the practice of defense in depth (having multiple layers of controls on sensitive systems), it’s best to combine these.
The first line of defense is at the service itself. The OpenTelemetry SDK allows you to register processors, which can be used to filter trace, metric, and log metadata like resources and attributes. These can be used to either create allow or deny lists for attribute keys or perform regular expression-based filters over all attribute values for data that matches a pattern. This is a good way to look for consistently formatted strings, like government IDs or phone numbers, that may appear as part of a longer string.
One consideration to take into account here is that the more processing you do at the service level, the longer it takes to emit telemetry. Using an allowlist of keys and values seems like a good option, but it increases the amount of friction required for developers to add new custom attributes. This is why, irrespective of how you process telemetry at the service level, it’s a good idea to also inspect and filter telemetry at the Collector.
Inside the Collector, you can utilize processors, such as the transform processor, to perform this filtering. The transform processor also works on trace, metric, or log data and is capable not only of redacting PII, but also transforming it. For example, you can do more than just delete PII—you can replace it with an acceptable value.
Another option is to use the schema processor (or a custom schema processor) in order to compare all telemetry against a pre-published schema of acceptable attribute keys. This also has the advantage of normalizing telemetry data as it flows through your pipeline, allowing you to commingle telemetry from multiple inconsistent sources.
The final stage of filtering comes down to your sampling decisions. Sampling is a method to reduce the volume of telemetry data ingested by your observability backend. Honeycomb’s Refinery offers a powerful dynamic sampler for trace telemetry, and you can configure Refinery rules to drop traces that contain PII, so that they never transit the network. One interesting application of this is to use a combination of transformations, routing, and Refinery in order to save potentially interesting telemetry that contains PII to alternative storage (for example, an internal-only S3 bucket) while replacing the values with pointers to the unredacted telemetry. Refinery can then make a decision about if the trace is interesting—perhaps it contains an error, or is needed for evaluating an SLO—and emit it to Honeycomb. This gives you the best of all worlds: you’re not leaking PII to sub-processors or third parties, but you can keep it if it’s relevant to debugging.
How Honeycomb makes OpenTelemetry more valuable
OpenTelemetry, as you can see, is an extremely powerful tool. Its modular design makes it possible to build complex and intricate observability pipelines in order to route and filter data based on your needs.
This power comes at a cost, though—it can be hard to understand! At Honeycomb, we’ve helped dozens of teams and organizations with their OpenTelemetry journey, moving them towards achieving a more sustainable and scalable observability practice. Don’t just take my word for it, though—book a demo to learn more.
There’s plenty of literature on the mechanics of instrumenting code with OpenTelemetry and delivering it to Honeycomb. However, I’ve not found many guides on the...