Lex Neva | Jul 16, 2024
We recently took a daring step to test and improve the reliability of the Honeycomb service: we abruptly destroyed one third of the infrastructure in our production environment using AWSâs Fault Injection Service. You might be wondering why the heck we did something so drastic. In this post, weâll go over why we did it and how we made sure that it wouldnât impact our service.
Rox Williams | Jul 15, 2024
Transitioning from a monolithic system to a cloud-native microservices environment, Ritchie Bros. sought to modernize their observability infrastructure to support the transition and fuel future growth.
Nick Travaglini | Jul 10, 2024
Getting the right people working in the CoPE is crucial to success because these change agents must limber up the organization and promote the flexibility necessary to perform resilience.
Liz Fong-Jones | Jul 09, 2024
Two years ago, we shared our experiences with adopting AWS Graviton3 and our enthusiasm for the future of AWS Graviton and Arm. Once again, we're privileged to share our experiences as a launch customer of the Amazon EC2 R8g instances powered by AWS Graviton4, the newest generation of AWS Graviton processors.
Rox Williams | Jul 08, 2024
The Bennett Institute for Applied Data Science at the University of Oxford is pioneering the better use of data, evidence, and digital tools in healthcare, policy, and beyond. The institute employs an open-source approach with its OpenSAFELY analytics platform, enabling high-impact research that yields actionable insights, drives innovation, and enhances lives globally.
Tyler Wilson | Jul 03, 2024
Generative AI is having a bit of a momentâwell, maybe more than just a bit. Itâs an exciting time to be alive for a lot of people. But what if you see stories detailing a six month old AI firm with no revenue seeking a $2 billion valuation and feel something other than excitement in the pit of your stomach?
Rox Williams | Jul 02, 2024
With a diverse range of applications, HappyCo sought to advance their system investigations with a modern observability solution while embarking on an application refactor project.
Rox Williams | Jun 27, 2024
In the not-too-distant past, building software was relatively straightforward. The simplicity of LAMP stacks, Rails, and other well-defined web frameworks provided a stable foundation. Issues were isolated, systems failed in predictable ways, and engineers had time to innovate on new features for the business. And it was good.
Martin Thwaites | Jun 24, 2024
Having telemetry is all well and goodâamazing, in fact. Itâs easy to do: add some OpenTelemetry auto-instrumentation libraries to your stack and theyâll fill your disks with data pretty quickly. However, having good telemetry dataâdata thatâs curated into being usefulâis something that is both cost-effective and represents good value.
Liz Fong-Jones | Jun 14, 2024
Everyone's talking about âobservability,â but many donât know what it is, what itâs for, or what benefits it offers. With this framing of observability in terms of goals instead of tools, we hope teams will have better language for improving what their organization delivers and how they deliver it.
Terra Field | Jun 07, 2024
Earlier this year, we upgraded from Confluent Platform 7.0.10 to 7.6.0. While the upgrade went smoothly, there was one thing that was different from previous upgrades: due to changes in the metadata format for Confluentâs Tiered Storage feature, all of our tiered storage metadata files had to be converted to a newer format.
Nick Travaglini | May 29, 2024
In part one of our CoPE series, we analogized the CoPE with safety departments. David Woods says that those safety departments must be: independent, involved, informed, informative. In this post, weâll elaborate on what each of those characteristics means, why the CoPE should also match those qualifications, and how to achieve that status.