Intro to o11ycast: A Human Perspective on the Role of ObservabilityBy Emily Nakashima | Last modified on January 31, 2021
Hi! I’m new here. And I don’t just mean at Honeycomb.io.
I’m Emily Ashley, a community-taught software engineer in New Orleans who made my way into software engineering through the wonderful world of web cartography. (yes, I love maps)
And yes, I listened to all 24 episodes of o11ycast in a month.
“But Emily, why would you do that?” I don’t know, you’d have to ask my team? Somewhere in the list of onboarding tasks was a checklist item reading: “Listen to the whole back catalog of o11ycast.” (After doing it, I can say it was 100% a good idea.)
Given my limited attention span, I jumped at a task I could do on a dog walk. I grabbed the leash and a few treats and scrolled down to Episode 1 and thought “I’m gonna be so lost.” Instrumentation, observability, operations, monitoring? I wouldn’t list any of these things as my expertise. Visualizations? Yeah you bet! Infrastructure? That word closer relates to Urban Planning concepts to me than software. When I started here, that expertise level could be described as somewhere between “bookmarked a dashboard page” and “never been the engineer on call.”
I pressed play expecting to be in completely new territory. Within 20 minutes I realized I was totally wrong — I DO know what they’re talking about. To my surprise, the whole series is set up around collecting stories and intentionally digging in and learning through collective pain points. It’s about socio-technical systems thinking and the people working in them. MY JAM.
And I gotta say, “meeting” guests from various backgrounds and stages in their career and hearing them talk about their human and technical problems — what a great way to learn. You see, for me, it’s the human element that makes system complexity delightfully approachable. Listening to o11ycast, I bounced very quickly between “YASS I KNOW RIGHT?” to “wait what? that’s a thing? let me look that up!” or “oh that’s the problem this or that tool was trying to solve?!?”
As someone fairly new to the industry, I often only get taught or sold solution tools without being introduced to the problems they were trying to solve. (which, speaking of complexity: how do you know when to stop using a solution if you don’t know what it’s solving?) o11ycast does this wonderfully. I especially appreciated the episode 4 with Adam Jacob of Chef, one of the founders of the DevOps movement. I entered the software industry in 2015, and to be honest, my previous understanding of “devops” was just as a buzzword or trend that companies were trying to follow. I had no idea what devops was trying to solve whatsoever. So thank you, Charity, Liz, and Adam.
After a few episodes, I wasn’t surprised anymore that o11ycast (unlike other podcasts I’ve listened to) wasn’t necessarily about showcasing the features of a solution, or a technical implementation to a solution. Through their conversations I learned that this is because observability is not some isolated solution to a problem only a few teams are having (of a certain size or budget or complexity or throughput), it’s a paradigm shift. Throughout the episodes they work on establishing shared experiences, a shared vocabulary, and explore mechanisms for dealing with both known-unknowns and unknown-unknowns. So no, they’re not necessarily gonna tell you about some technology or implementation - they talk about the problems software teams are a facing, what the lens of “observability” can do, and the hopes the hosts have for effective & happy software teams now and in the future.
This is so approachable. I get this. About half of my software career has been working closely with nonprofits and I’m all too familiar with that “we’re not google or facebook - we don’t get nice things” sentiment. But working with smaller teams and projects I also get the “just because your site is up doesn’t mean it isn’t failing” sentiment. A LOT.
It's systems where you just have to embrace failure. It's constant, and that's okay because most of them aren't catastrophic. But they're there, and people delude themselves into thinking that they're not failing just because their site's up. There's so many cockroaches living under that rug.
-- Charity Majors, o11ycast Episode 2 with Christina Noren
I appreciated how Charity emphasized that “source code is not the source of truth - your code running in PROD is.”
It angers me when I see people trying to understand their code by reading it because fuck you, you can't.
You know that is not the source of truth for your code.
The source of truth for your software is in production. And the sooner you make that mental shift, and start seeing your interpretation, not as an afterthought or later or a no eventually or just for monitoring as how you can see if your code actually works or not, and it doesn't work until you go and verify like that.
-- Charity Majors, o11ycast Episode 20 with Marco Rogers
This is a lesson I had started to learn as a developer and it’s so true. On one of my first software projects I removed over 100,000 lines of code from version control trying to sort out “the truth.” It was so hard to figure out what was happening from reading the code - so many things named similarly. Even when I cleaned up that much code, it wasn’t clear what was overwriting what or where it was going or coming. I spent countless hours stepping up and down stack traces. (Interactive python debugging even made its way into my dreams). I can’t help but thinking we all would’ve had a much better understanding of that legacy application with an observability lens and tool like Honeycomb.
Another thing that struck a chord from my experience on developer teams: When the rate of change is so slow, whatever is broke stays broke. We don’t how the system works, we just know if you don’t mess with it it’s cool. But being scared to make changes means your systems are brittle, and when you do finally need to change something (ahem, security patches anyone?) then it’s way more dangerous than it needs to be.
It's not the architecture. It's not that you've factored the application differently, that's not it. It's the rate of change and when and how that change is triggered and the rate at which we can understand it. That's what drives it.
So, it's not because you're a microservice or a monolith. It's because the velocity of that thing gets bigger.
-- Adam Jacob, o11ycast Episode 4
Point being, observability isn’t some obscure, elite software engineering thing. It’s something we already kinda know, but haven’t really used as a lens through which to view our software problems. After listening to o11ycast, I can say it was 100% a good idea. I have a much better idea about how observability can help developers like me. Because more things are broken than we can see, so let’s collect runtime data on everything (at the right level of abstraction) and continuously interrogate our systems. And honestly, I don’t think I’d be as scared if someone told me I was “on call.”
Intercom’s mission is to build better communication between businesses and their customers. With that in mind, they began their journey away from metrics alone and...
In the last few years, the usage of databases that charge by request, query, or insert—rather than by provisioned compute infrastructure (e.g., CPU, RAM, etc.)—has...
As long as humans have written software, we’ve needed to understand why our expectations (the logic we thought we wrote) don’t match reality (the logic...