Options for Managing Telemetry Volume At Scale & How Slack Does It



Welcome to The Authors’ Cut Series
In this session, you’ll learn the pros and cons of different sampling techniques and how to still retain granular visibility into your system state. We discuss why sampled events are better than the traditional method of pre-aggregated metrics, brought to life through a demo of Refinery, Honeycomb’s sampling solution.

Specific concepts include:
- Cheap and Accurate Enough: Sampling. Code-based examples of various sampling techniques and how you can make informed decisions about which events can help you surface unusual system behavior. (Chapter 17)
- Slack example: Telemetry Management with Pipelines. Slack routes millions of events per second to multiple backend systems. See how they’ve developed a strong telemetry management practice that makes services sufficiently observable while minimizing the burden on developers. (Chapter 18)
- A Live Honeycomb Demo. Slack engineering leaders, Suman Karumuri and Ryan Katkov, will showcase their use of telemetry pipelines to route data to many backend systems in order to isolate workloads, meet security and compliance needs, satisfy different retention requirements, and more.

About This Series
Welcome to The Authors’ Cut series. In writing the O’Reilly Observability Engineering book, our goal is to help you achieve production excellence, based on our experiences building and operating commercial SaaS products at scale, and as creators of observability tooling for high-performance engineering teams. These are interactive sessions led by authors Charity Majors, Liz Fong-Jones, and George Miranda where you’ll discuss concepts in the book, see how to apply them in Honeycomb, and get advice on strategy and implementation in your world.