OpenTelemetry   Customer Stories  

Modern Observability in Action at the University of Oxford 

By Rox Williams  |   Last modified on July 9, 2024

Challenge

The Bennett Institute for Applied Data Science at the University of Oxford is pioneering the better use of data, evidence, and digital tools in healthcare, policy, and beyond. The institute employs an open-source approach with its OpenSAFELY analytics platform, enabling high-impact research that yields actionable insights, drives innovation, and enhances lives globally.

After launching the platform, the core team of 12 engineers faced challenges in achieving effective observability. The process involved manually obtaining and correlating individual logs from various machines, with information scattered across multiple sources. This labor-intensive task required the team to synchronize timestamps and events between different log sources, often leading to inefficiencies and potential errors. Additionally, access to machine logs was restricted due to the presence of sensitive data, further complicating the process.

"It became clear that we needed a better solution for telemetry," said Simon Davy, Senior Research Software Engineer at the Bennett Institute for Applied Data Science at the University of Oxford. “We wanted to identify system issues proactively, rather than relying on users to bring them to our attention.”

Solution

Seeking a cutting-edge observability solution to enhance their development team's efficiency and streamline daily operations, the Bennett Institute embarked on a vendor search that swiftly pointed them to Honeycomb. As an OpenTelemetry-native observability tool, Honeycomb offered detailed visualization and real-time, proactive debugging capabilities that the institute knew would be a perfect fit to significantly improve their workflow. 

The institute’s stand-out reasons for choosing Honeycomb included:

  • Modern observability approach
    “We’d been transitioning to the arbitrarily-wide structured events model for observability, which perfectly aligned with Honeycomb’s approach. I appreciated that Honeycomb provided excellent solutions to challenges I had been grappling with for years using legacy tools focused on logs, metrics, and traces,” said Simon.

  • OpenTelemetry integration
    "Honeycomb's native support for OpenTelemetry was a key factor for us in our selection decision. Having the ability to easily instrument our code and adhere to OTel standards was a significant motivator as it aligns perfectly with our open-source strategy,” shared Tom Ward, Senior Research Software Engineer at the Bennett Institute for Applied Data Science at the University of Oxford.
  • Free plan option
    "The availability of a free plan option was instrumental for us. It provided an easy entry point and significant value, which was particularly generous given the limited budget typical of higher education institutions," explained Simon. 

Results

Monitoring long running jobs 

OpenSAFELY is used by researchers from 22 UK research and health data institutions to execute query requests for reports in over 160 research projects, such as extracting cohorts of patients from large databases. These queries often involve long-running SQL operations that sift through billions of rows to aggregate the necessary data points for their studies. The platform experiences low user traffic but hosts numerous long-running jobs that can span several days. 

Simon Davy, Senior Research Software Engineer at the Bennett Institute for Applied Data Science at the University of Oxford, speaks about their users' experience.

Given the extended duration of these jobs and the importance of ensuring smooth operation throughout, the team needed a way to effectively monitor job queues and the systems they run on when deploying data pipelines to users. Honeycomb was instrumental in providing the necessary observability and proactive monitoring capabilities.

“Instrumenting long-running jobs provided visibility into our core operations—specifically job scheduling, which is crucial for us. It helped us understand queueing times and reasons for delays, which were previously a black box due to the volume of log activity," said Simon, adding that “It’s important to us that our users have an experience with the system that is responsive, trustworthy, and reliable; Honeycomb helps us make that a reality.”

Streamlining access to telemetry data

Prior to adopting Honeycomb, accessing data sources required navigating secure partner environments' machine logs, often involving complex VPN and RDP interactions. Honeycomb has streamlined and simplified this process. Leveraging Honeycomb's native integration with OpenTelemetry, the team can now easily instrument, collect, and export rich telemetry data.

"Being able to immediately dive into telemetry data with Honeycomb and diagnose problems without manually logging into machines was a gamechanger. Honeycomb's capability to provide detailed observability while maintaining data security aligns perfectly with our mission to be good stewards of data," Tom highlighted.

Acting on data-driven insights

Harnessing Honeycomb's observability, the Bennett Institute for Applied Data Science at the University of Oxford has gained critical insights into their systems, empowering them to proactively identify and address performance issues. This capability has been pivotal in enhancing system service performance and making continuous optimizations.

"Honeycomb has been instrumental in helping our team quickly identify performance problems. One example was when we discovered that some aggressive web bots were crawling our entire paginated job log history, which was expensive for us. With Honeycomb, we could immediately pinpoint these bad bots, block them, and optimize our pages to deliver faster response times for non-logged-in users,” Simon explained. 

And when it comes to adopting data-driven engineering practices, Tom added, "Honeycomb allows our team to answer almost any question we have about the system, promoting a 'check the data first, rather than guess' kind of mindset. We can now check the data first to confirm a hypothesis before diving into the code to fix something."

Interested in learning more? Book a consultation with our sales team.

Modern Observability in Action at the University of Oxford 

At a glance

About

The Bennett Institute for Applied Data Science at the University of Oxford focuses on harnessing the power of data to address real-world challenges. Through cutting-edge research and interdisciplinary collaboration, the institute advances data science methodologies and their practical applications in health, policy, and beyond. The goal is to transform data into actionable insights, driving innovation, and improving lives globally.

Industry

Higher Education

Products

Honeycomb platform

Use cases

Platform 

Results

  • Optimized page load times and reduced server strain by effectively managing web crawlers
  • Improved visibility into long-running jobs, enabling the team to detect and address issues before they impact users
  • Promoted a culture of verifying hypotheses with data first, leading to more accurate and efficient problem-solving
  • Maintained high standards of data management and privacy while streamlining access to telemetry data

"Honeycomb has been invaluable in helping us build a cutting-edge platform to drive our institute's high-impact mission forward. And powerful features like BubbleUp and native support for OpenTelemetry, have been gamechangers in modernizing our observability practices."

Simon Davy, Senior Research Software Engineer, Bennett Institute for Applied Data Science, University of Oxford

 

Related Posts

Debugging   Customer Stories  

Unlocking Smiles: HappyCo's Observability Success 

With a diverse range of applications, HappyCo sought to advance their system investigations with a modern observability solution while embarking on an application refactor project....

OpenTelemetry  

OpenTelemetry Best Practices #3: Data Prep and Cleansing

Having telemetry is all well and good—amazing, in fact. It’s easy to do: add some OpenTelemetry auto-instrumentation libraries to your stack and they’ll fill your...

Logging   Customer Stories  

Empowering Engineering Excellence: Achieving a 26% Reduction in On-call Pages at Amperity with Modern Observability for Logs

Amperity required an observability partner to facilitate their transition into the modern engineering era as their previous tooling struggled to support their growth strategy....