Hypothesis: Adopting observability tools, site reliability engineering (SRE) practices and a culture of shared ownership translates to efficiencies across the software engineering cycle, better end-user experiences and ultimately Production Excellence.
- Observability enables production excellence.
Advanced observability enables better outcomes for engineering and DevOps / SRE teams. Survey respondents with advanced observability are more aware of where their tech debt lies, are more proactive about paying it down, have more confidence in their ability to detect bugs in production, and have more satisfied end-users.
- Three in four teams have yet to begin or are early in their observability journeys.
Around a quarter of survey respondents exhibit advanced observability practices. A plurality of teams are actively on their journeys but still early, with some observability tooling or processes—but not both. The remainder of respondents are generally aware of observability and its benefits—though some conflate observability with tangential practices like metrics monitoring and log management tooling—but have not developed plans for or prioritized observability that encompasses practices and culture in addition to tooling.
- There is momentum behind the shift toward achieving more observable systems.
As the benefits of observability become clearer, most teams that are not already practicing at an advanced level have near-term plans to move in that direction. Of the 47% of community respondents whose teams are not currently practicing observability, three in four have a plan to do so within the next two years.
- Advanced observability practitioners focus on outcomes.
By framing observability objectives around desired outcomes including higher quality instrumented code, predictable release cadence, confidence in detecting bugs, ability to resolve incidents, and maintain resilient systems, teams can better position themselves to achieve production excellence.
Observability is having the tooling and processes in place to be able to ask arbitrary questions about your environment without—and this is the key component—having to know ahead of time what you wanted to ask. But what does that look like in practice? What sets apart organizations with the most advanced observability? Our research1 verifies that teams who have invested in making their systems observable are more likely to:
- Have code that is well-understood, well-maintained, and has a low level of bugs.
- Have the ability to follow predictable release cycles because they confidently address issues that arise.
- Understand the end-to-end performance of their systems, including their technical debt.
- Have the ability to visualize context-rich events that allow efficient, focused, and actionable on-call processes.
- Have the ability to prioritize responsiveness to user behavior and feedback.
Why Practice Observability?
Observability is a new and rapidly evolving discipline for software engineering teams. Most companies have a sense they would benefit from better observability practices but have more questions than answers. What outcomes can teams realize? What practices do we need to implement? Where do we even start? Is it worth it?
The survey was fielded to quantify the outcomes and benefits of implementing five core observability capabilities as outlined in the observability maturity model framework and its impact on the software engineering cycle. We also sought to better understand the challenges and behaviors that teams face on their observability journeys in addition to helping provide relevant guidance on how to adopt observability across teams and organization-wide.
We wanted to test this hypothesis: adopting observability tools, site reliability engineering (SRE) practices, and a culture of shared ownership translates to efficiencies across the software engineering cycle. We believe that because observable systems are both easier to manage and maintain, teams have more time to innovate. This, in turn, leads to better end-user experiences and ultimately Production Excellence.
Shared ownership: The trick is to ensure that regardless of your organizations’ different operating models or toolchains, there is shared visibility, communication, and collaboration across teams. This will allow your disparate teams to stay aligned while using the best practices from ITIL, DevOps, and SRE.
Production Excellence goes beyond ownership, which means that engineers and DevOps/SRE teams require cultural change. Teams are more sustainable if they have well-defined measurements of reliability, the capability to debug new problems, a culture that fosters spreading knowledge, and a proactive approach to mitigating risk. While tools play an important part in supporting a reliable system, culture and people are the most important investment. Learn more about what production excellence means and steps you can take to achieve it.
At Honeycomb, we often work with teams that believe they are practicing observability, but are not yet realizing the benefits of production excellence. Our evaluations typically reveal two blockers: 1) use of tangential tools masquerading as observability tooling, 2) observability tooling without the cultural processes.
From our work in the field, we learned that there is no one-size-fits-all approach to implementing and practicing observability. It is neither a linear organizational shift nor a point-solution tooling that solves one particular problem-set. Team focus often depends on organizational structure in addition to system maturity and different challenges arise at different growth phases.
For teams interested in modifying processes and adjusting resource investments, we recommend this Developing a Culture of Observability eGuide. Observability is a journey and it requires a combination of cultural practices, tooling, and processes— outlined in the Maturity Model Framework and depicted in the diagram below— which teams can and do adopt incrementally.
The Observability Spectrum
How Teams are Practicing & Realizing Observability Outcomes
Observability is an iterative process, or a journey. That process, however, is never truly finished and is often a fragmented journey. Most companies have adopted some—but not all—observability practices. Which practices they have adopted, however, are not necessarily consistent. Companies do not simply move from A to B to C and so forth. Rather, some start at A and go to C, others begin at B and move to A, still, others adopt C and believe that is sufficient for their purposes at that point in time.
At the outset, we sought to segment companies that were not on the starting point of their practice, but on the breadth and depth of observability adoption. What emerged was a model that evaluates the extent to which teams practice and realize observability outcomes: The Observability Spectrum. We began by assessing self-reported observability practices: organization-wide practices, team-specific practices, and observability tooling. From there, weighted inputs from 35 survey questions measuring tooling, capabilities, and processes shifted participants up and down the spectrum. In the end, The Observability Spectrum identifies 5 distinct groups:
The first group we describe as those members of our community practicing and realizing the most core observability outcomes, or Enlightened (Highly Advanced). Those who are practicing and beginning to realize observability outcomes fall in our Sees the Light (Advanced) category.
More than one in three respondents fall in our Working toward Clarity (Evolving) category. These respondents report practicing observability, but their practices and processes suggest they are in the earliest stages of adoption, or perhaps have confused observability with using tangential tooling including monitoring or logging.
Many survey participants have yet to begin to practice observability. Those who indicate an intention to practice observability in the next 12-24 months and whose behaviors and capabilities suggest their organization is well-positioned to embrace observability, fall in our Searching for the Path (Planning) category. Finally, the fifth group on the spectrum are those who do not have plans to adopt observability or whose behaviors and capabilities suggest barriers to adoption. These fall in our In the Dark (Laggard) category.
The Observability Spectrum (and where respondents fall)
Enlightened (Highly Advanced) & Sees the Light (Advanced) Groups
Short of 10% of those surveyed report a combination of practices and tooling that reflect a highly advanced observable system (Highly Advanced). Another 17% (Advanced) of participants report practices, tooling, and outcomes consistent with relatively sophisticated observability practices. These teams are leveraging observability tooling and/or processes on a team or organization-wide basis and also prioritize key practices to achieve performance excellence.
Defining characteristics: Observability is a high priority for their teams. Cohort members come from a mix of company sizes. Companies represented by this cohort tend to have a high number of developers (42% have 100+ developers).
Working Toward Clarity Group (Evolving)
Most members of our community fall in the evolving middle. Participants report observability processes or tooling—but rarely both—and report some, but not most, key capabilities.
Defining characteristics: Mostly represent small <100 employee companies. A majority within this group report observability is mostly practiced on a team-by-team basis (64%), roughly half (46%) do not report using observability tooling.
Searching for the Path (Planning)
Approximately one in five respondents do not currently practice or use observability tooling, but have plans to practice observability within the next year. Among these teams, reported practices and tooling suggest production excellence is a priority.
Defining characteristics: High observability awareness (92%); more DevOps and Managers; most from smaller, <100 employee companies (45%).
In the Dark (Laggards)
17% of respondents are not practicing observability, do not currently have tools, and have no plans in the near
Cultivating Production Excellence
Taming the complex distributed systems is not just changing tools and techniques. It also requires changing who is involved in production, how they collaborate, and how we measure success. Liz Fong-Jones walks through the thinking behind a team that strives for production excellence.
A New Framework for An Observability Maturity Model
Everyone is talking about "observability," but mapping out a game plan to get there has not yet been clearly defined.
Framework for an Observability Maturity Model
Everyone is talking about "observability", but many don’t know what it is, what it’s for, or what benefits it offers. The framework we describe here is a starting point. With it, we aim to give organizations the structure and tools to begin asking questions of themselves, and the context to interpret and describe their own situation--both where they are now, and where they could be.