Ep. #59, Learning From Incidents with Laura Maguire of Jeli
In episode 59 of o11ycast, Jess and Martin speak with Laura Maguire of Jeli and Nick Travaglini of Honeycomb. They unpack Learning From Incidents (LFI), resilience engineering, process tracing, safety science, and the human side of observability.
Ep. #36, Resilience Engineering with Jacob Scott of Stripe
In episode 36 of o11ycast, Charity and Liz speak with Jacob Scott of Stripe about the need for SRE teams, prioritizing customer happiness, and the limitations of distributed tracing tools.
Intercom: Building a More Resilient Ecosystem Through Observability
Intercom and Honeycomb discuss how Intercom uses distributed traces to streamline their observability workflows, allowing their product engineers to learn about and from their production to increase Intercom’s resilience.
Stepping Our Way Into Resilient Services
Is it possible to discover unknown-unknowns proactively with Chaos Engineering? Where exactly is the intersection between intentionally breaking production services and discovering the multitude of ways they could be broken with observability? This is a short presentation that leaves plenty of time to have a real-time discussion with George Miranda, take audience questions, and explore practical steps you can take with your teams as you step your way toward improving service resilience.