Bees Working Together: How ecobee’s Engineers Adopted HoneycombBy Deirdre Mahon | Last modified on July 28, 2020
At ecobee, adopting Honeycomb started as a grassroots effort. Engineers signed up for the free tier and quickly started sharing insights with teammates. When it came time for ecobee to make the “build vs. buy” decision for observability tooling, sticking with Honeycomb was the clear choice. Now on the enterprise plan, ecobee’s engineering squads rely on features like SLOs to support the business’s need to map engineering effort to user impact.
Founded in 2007, the Canadian company makes smart thermostats, temperature sensors, light switches, cameras, and contact sensors that keep your home comfortable when you’re there and save you money when you’re not.
ecobee Squads & Goals
Engineering at ecobee is highly distributed with teams organized around the squad model, where each autonomous team is responsible for different ecobee product lines. Many squads share the same engineering tools in order to achieve greater visibility and understanding about the overall customer experience.
“When something happens on one platform or a distinct part of our service, we want more visibility into what’s happening in the other parts of the platform.”
Erol Blakely, Director of SRE, ecobee
Why Observability, Why Now?
Squads at ecobee have standardized on using Prometheus and Grafana for system and application monitoring, which satisfies the need for specific use-cases like seeing spikes in latency. However, the team believes these tools only get them so far and they don’t fulfill all of ecobee’s requirements. The “last mile” of performance tuning has proven to be one of the hardest problems to tackle.
The ecobee Beehive team manages API services for mobile consumer-facing apps. When the team started using Honeycomb, they quickly discovered that the ability to drill-down arbitrarily let them understand the source of any system latency. The team now uses Honeycomb to continuously optimize performance, often measuring improvements in milliseconds. Those seemingly small optimizations have a big impact when managing millions of customers.
The decision for ecobee to dig deeper into evaluating Honeycomb was, in part, due to a core group of engineers, led by Ray Slakinski, taking on a campaign of weekly Slack pings to repeatedly ask, “when are we going to adopt Honeycomb? It really solves a lot of our problems.”
“At that time, we were observing regular spikes in our API latency that pushed us far beyond our SLO. We were spending lots of time digging through our metrics, trying to correlate the data and come up with an explanation. It quickly became clear that we were wasting time and ultimately we were just guessing and getting nowhere. We needed something that would gather more detailed data and present it in an intuitive way for us to dig into it.”
Dustin Neray, Engineer, ecobee
The team decided to use OpenTelemetry’s auto-instrumentation for Java along with OpenTelemetry collectors, running as sidecar agents. The sidecars send data to an OpenTelemetry Central Collector, which then sends that data to Honeycomb.
“Getting everything set up and instrumenting our application with Honeycomb was a breeze. It integrates nicely with the open source ecosystem and it eliminates the complexity of trying to run, monitor, and scale our own solution. As soon as we enabled auto-instrumentation and passed in our API key, we immediately started to gain insights into our application that were never possible before.”
Ray Slakinski, Staff Developer, SRE
Adoption Started with Honeycomb’s Free plan & Has Seen Fast Growth
Many engineers at ecobee first learned how to use Honeycomb via the Free plan. Giving teams a chance to experience Honeycomb’s value in a low-risk setting enabled adoption to grow quickly across different squads.
“The teams started with Honeycomb’s Free plan which provides ample monthly events to really try out some of the unique features including BubbleUp. We don’t have a top-down edict when it comes to tool adoption at ecobee and it’s so much better when teams collaboratively learn from each other. If I didn’t get budget approval to purchase Honeycomb, I feel like my team would have chased me out of the building because they started to depend on it daily to do their jobs.”
Across ecobee’s ten squads, many are actively using Honeycomb today and more squads regularly discover new use-cases. For example, the Beehive squad uses Honeycomb extensively to help with customer support. If the support team identifies a customer issue, such as an inability to make payments via their mobile client, the squad uses Honeycomb as a source of truth for digging in and determining the cause of errors.
“It’s so much easier to use Honeycomb than to grab the logs that have a GCS bucket, download them and start grepping until you’re ‘blue in the face.”’
ecobee’s Favorite 3 Letters - SLO
Developers, SREs, and team managers at ecobee all use Honeycomb today. All engineers are responsible for checking in their code and deploying new releases to production using CircleCI. Some teams are starting to use Honeycomb’s Service Level Objectives (SLO) feature, available to all subscribers of the Enterprise plan.
“Honeycomb has made implementing SLOs easy once you agree on the criteria. Previously, you’d have to go to Grafana or Prometheus and start building backwards: you start by building the correct SLI (indicator) to inform on the stated SLO, which of course is time-consuming and error-prone.”
ecobee found Honeycomb’s SLO feature to be intuitive and easy to use, with plenty of examples that helped get them started. They could easily set up a new SLO, let it run for a few weeks, determine how well they were doing over time, and modify it as needed. Erol’s SRE team also works closely with business needs. Honeycomb’s SLO feature helps keep everyone informed about how production is performing at any point in time.
“It’s not just pretty graphs! SLOs really tell you where to focus, based on what matters to customers. It informs the team to make sound engineering decisions. We now can decide: do we work on availability of our services or do we release a new feature? It’s really that simple and so important. I love that SLOs are baked into the core product and I don’t have to think about this as a separate tool or budget item.”
Expansion Across Squads
As more team-members become proficient with Honeycomb, it is increasingly perceived as a reliable source of truth for incident response and performance optimization. They expect to see further expansion across ecobee’s many squads. Although they’ve already made tremendous gains, the team at ecobee feels like they are just getting started on their observability journey. They credit their culture of shared service ownership across squads as a key ingredient to achieving success.
“We previously spent a lot of time fumbling in the dark. Some teams say they would not be able to do their job without Honeycomb. It’s absolutely critical.” Ray Slakinski
Try out Honeycomb today, sign up for free!
Customer-Centric Observability: Experiences, Not Just Metrics
Frontend observability is a tricky problem. No website is free of errors or slowdowns; sites break down in weird ways for all kinds of reasons....
What Is a Telemetry Pipeline?
In a simple deployment, an application will emit spans, metrics, and logs which will be sent to api.honeycomb.io and show up in charts. This works...
Observing the Future: The Power of Observability During Development
Modern software development—where code is shipped fast and fixed quickly—simply can’t happen without building observability in before deployments happen. Teams need to see inside the...