OpenTelemetry   Instrumentation  

OpenTelemetry Best Practices #2: Agents, Sidecars, Collectors, Coded Instrumentation

By Martin Thwaites  |   Last modified on May 15, 2024

For years, we’ve been installing what vendors have referred to as “agents” that reach into our applications and pull out useful telemetry information from them. From monitoring agents, to full-blown APM tools, this has been the standard for many decades. With OpenTelemetry though, the term “agent” isn’t used as much, and in most scenarios means something slightly different. In this post, we’ll talk about the fact that you can achieve the same “hands off” process with OpenTelemetry, but also when you should and shouldn’t consider using the more automatic approach to telemetry collection. 

The best practices piece of advice here is to remain idiomatic to the language and framework where possible, and align to what engineers in your teams feel comfortable doing. For some languages, side-loading an agent can feel wrong (.NET for instance), whereas in some languages, instrumenting in code may feel like an anti-pattern (Java for instance) and other techniques like annotations or monkey patching would be more idiomatic. Keeping instrumentation as close to your native development flow as possible is a major advantage in adoption.

Getting started with automatic instrumentation

In OpenTelemetry, automatic instrumentation (or auto-instrumentation) refers to instrumenting code without making any code changes. It’s a useful way to get value for very little effort, and it’s similar to the legacy APM/agent-style approach of installing some service on the machines that your application runs on, so it may feel familiar to you if you’re used to doing that.

Using auto-instrumentation in OpenTelemetry is slightly better than the legacy APM approach, in that the data that you’ll see is not tied to the backend observability platform—rather, it was deemed useful by the people who wrote the libraries themselves. It’s a great starting point—you can get lots of value from this data if you don’t have anything set up, or if you’re relying on a more “observability 1.0” style approach of tailing logs and relying purely on infrastructure metrics. 

However, it is just that: a great start. It’s not the end goal of implementing a robust telemetry approach—one that allows you to understand what your application is doing when it’s not running on your machine. You should, however, be able to answer a large number of questions from this out-of-the-box data but it’ll lack your own context. 

Questions you will be able to answer:

  • Which API endpoints are returning slowly and what is normal?
  • Which API endpoints get the most errors?
  • Which third parties are responding slowly?
  • What are the slowest database calls?

Questions you won’t be able to answer:

  • Are endpoints slower for baskets with more items?
  • Does the number of content blocks on the page affect load time?
  • Which clients are experiencing the most latency?

Coded instrumentation

In OpenTelemetry, coded (or manual) instrumentation refers to adding the boilerplate (or setup) part using code inside your project. For some languages, like .NET, this is more idiomatic for developers, and is more easily accepted into their mental model. But for others, like Java, this may feel like an anti-pattern.

Coded instrumentation doesn’t require you to add large amounts of code. In most scenarios, it’s only a few lines of boilerplate code that can be copied and pasted. Coded instrumentation will give you parity with the auto-instrumentation/Agent approach within a few lines of code due to the fact that they’re using the same instrumentation libraries under the hood.

Custom instrumentation: the best option and best practice

If you’re looking for answers to questions that you can’t get through auto-instrumentation, custom instrumentation is what you’re looking for. It’s the next step once you’ve started to get basic information either through auto-instrumentation via agents, or through manual instrumentation.

This is a step that requires involvement from the team of people who understand the code. In an ideal world, this would be a union of the SRE team and application engineers, working together to establish what context is important. That said, it’s just as valid that application engineers alone add more context as they go.

Custom instrumentation, though it sounds daunting, doesn’t require you to dedicate large amounts of engineering hours to augment your code. Once you’ve started getting some data (either through auto or manual instrumentation), as an application engineer you can start to add new context to your instrumentation that makes sense. New features and bug fixes should include telemetry as a core part of the work, adding context to existing traces and metrics, adding new spans or new meters as required.

You’re on your way!

Once you have this process embedded with your engineers, you’re on your way to building a true observability culture within your organization where engineers feel empowered to add the information they need to in order to confidently support the production environment. You can’t go wrong with that!

If you enjoyed part two of my best practices series and want to learn more, you can find part one here: OpenTelemetry Best Practices #1: Naming.


Related Posts

OpenTelemetry   Customer Stories  

Modern Observability in Action at the University of Oxford 

The Bennett Institute for Applied Data Science at the University of Oxford is pioneering the better use of data, evidence, and digital tools in healthcare,...


OpenTelemetry Best Practices #3: Data Prep and Cleansing

Having telemetry is all well and good—amazing, in fact. It’s easy to do: add some OpenTelemetry auto-instrumentation libraries to your stack and they’ll fill your...

OpenTelemetry   Observability  

Observability, Telemetry, and Monitoring: Learn About the Differences

Over the past five years, software and systems have become increasingly complex and challenging for teams to understand. A challenging macroeconomic environment, the rise of...