Guides Observability Engineering Best Practices
Adopting Observability: Lessons Learned On How to Reduce Cognitive Load
15 minute read
Introduction
At this yearâs hnycon, we hosted a roundtable discussion with a few of our guest speakers about the lessons theyâve learned while implementing observability and Honeycomb at their organization. The speakers included:
- Frank Chen, Senior Staff Software Engineer at Slack
- Glen Mailer, Senior Staff Software Engineer at CircleCI
- John Casey, Principal Software Engineer at Red Hat
- Michael Ericksen, Staff Site Reliability Engineer at Intelligent Medical Objects (IMO)
- Pierre Vincent, Head of SRE at Glofox
- Renato Todorov, Global VP of Engineering at HelloFresh
The discussion touched on quite a few things, but one of the main points everyone agreed on was the importance of reducing the cognitive load placed on teams when introducing observability. Renato Todorov shared his experience at HelloFresh:
âWe weren’t expecting 300 people to immediately jump in when we said âobservability is the thing,â but we also didn’t expect such a low engagement. We didn’t consider the cognitive load that people were already dealing with. When we pushed for adoption, people were busy working on other stuff and we were just dumping them a Jira task.â
To smooth out the process of adopting observability, you need to reduce the cognitive load you place on your stakeholders by starting small, engaging peopleâs curiosity, and leveraging storytelling.
Lesson One: Start small and use a compelling hook
Start small with what Michael Ericksen (IMO) called a âhook.â If you think back to your high school English class, youâll remember that a hook is the first two to three sentences in your essay that are supposed to grab your readersâ interest and give them a hint of whatâs in store.
Applied to observability, your âreadersâ are your stakeholders and you should pick something smallâin other words, something thatâs easily achievable, compelling, and educational for everyone involved.
A good way to ensure your hook achieves all three is to tie it directly to business objectives. Frank Chen (Slack) recommended that you âmotivate folks with their specific business problem.â For instance, what is the goal theyâre trying to achieve and can observability remove obstacles?
If you donât start small, your goal of implementing observability will likely be lost in the backlog of everything else the team is working on. Michael described the typical result of not starting small at IMO: âThe engineering team [would say], âWe’ve added the story in our backlog:
Make application observable.â You can start much smaller than, âThe whole thing needs to be observable.â You just need a hook into the system.â
An example of a great hook is one Rich Anakor shared in his keynote speech about a team at Vanguard performing a migration from on-prem to a cloud repository. They were struggling for months trying to figure out all the dependencies. Once Richâs team used observability to help pinpoint the issues, they found their answers in minutes.
This is a great hook because it was a smaller test for Richâs team to try out observability that had an immediate impact on stakeholders. This kind of excitement and the amount of time saved is very importantâaccording to John Casey (Red Hat), itâs key to inspiring curiosity among your stakeholders. âYou don’t have to have the entire thing done to get the benefit. What we’re doing [at Red Hat] is starting in one place and trying to build our way out [from there]. You have to give people space and time to have curiosity. Time pressure kills curiosity.â
Starting small reduces the cognitive load from âimplement observabilityâ to âimplement observability in this one, small instance.â
Perhaps most importantly, starting small has the potential to get people excited and curious about whatâs truly possible with a full-fledged culture of observability.
Lesson Two: Engage your stakeholdersâ curiosity
Curiosity makes work funâit literally engages the dopamine pathways in your brain. Site reliability engineers (SRE) and other support roles know this all too well because itâs their responsibility to be curious when it comes to solving problems in production.
Unfortunately, this curiosity isnât always shared among other engineers in the organization who might see implementing observability as just another Jira ticket adding to their cognitive load. In light of this, Pierre Vincent (Glofox) suggested that you should try to get a curiosity mindset going for everybody. âIt’s actually a little bit of a treasure hunt. It’s kind of a game, right?â
The âtreasure huntâ terminology struck a chord with the whole panel. Hunting down issues with observability is kind of like a game or, as Michael (IMO) described it in his talk, The Curious Case of the Latency Spike, a Knives Outâstyle sleuth.
To spread this curiosity mindset, there are a couple of practical things you can do beyond writing up a report or scheduling a meeting. Pierre recommended short videos. âA five- minute Loom video explaining some weird thing in production and how we figured it out with Honeycomb is useful. [Show] them how [you] found that needle in that haystack.â
Meanwhile, Glen Mailer mentioned that at CircleCI, chat logs were very effective. âI think [itâs powerful to be able] to see a flow of chat with the Honeycomb queries in it and then referring back to that later.â
However you share the story, make sure itâs referenceable later and that it engages your stakeholderâs curiosity. Treat observability like the treasure hunt it is and youâll reduce their cognitive load.
Lesson Three: Combat complexity with storytelling
One way to make observability adoption easier on others is to use your storytelling skills to illustrate complex topics and educate stakeholders.
Because youâre starting small, you can build up examples of how observability benefits the business, not to mention the daily lives of engineers. Spinning these out into simple stories makes the technical aspects of observability easy for everyone to digest.
Frank gave a great example during the panel (which he explained in more detail during his talk) where his team implemented their first cross-service trace on the second day of a multi-day cascading failure incident. They were able to solve the incident quickly with a cross-service trace when other efforts failed. Afterward, he said this situation really helped build a lot of interest in how other teams inside of Slack could adopt tracing and use this tooling.
This single, focused application of tracing gave Frank something specific to point to when explaining the importance of observability without having to go into details about how it works.
A side effect of these stories is that youâll start to build a library of anecdotes and metaphors, like Pierreâs âtreasure huntâ and Michaelâs âmystery.â In Frankâs case, he came up with this explanation for the value of Honeycomb:
These stories, anecdotes, and metaphors make it easier to internally champion the benefits of Honeycomb and observability over time, even among people who arenât directly involved.
With storytelling, you can reduce the cognitive load on your stakeholders by making it easier to understand the