How to Effectively Lead High-Performing Engineering TeamsBy Harrison Calato | Last modified on August 18, 2022
What are the foundational elements of a high-performance engineering team? While there’s no silver bullet, a few common threads make up the fabric of engineering teams that set the standard for velocity, quality, and innovation.
Charity Majors, CTO and Co-founder of Honeycomb, and James Governor, Co-founder and Analyst at RedMonk, shared their expertise and knowledge gleaned from working with high-performance engineering teams in a recent webinar—we’ve highlighted some takeaways in this blog.
Leading with empathy—and using service level objectives
The overall theme is high-performing engineering teams are generally the ones that humanize the process. Whether you’re trying to increase productivity or release better-quality code, the biggest piece of advice is to lead with empathy.
One way of leading with empathy is recognizing that the people who are on call have lives outside of work and being constantly interrupted by unactionable alerts can quickly lead to burnout … which led Charity and James to talk about the importance of service level objectives (SLOs).
“You’ve got to adopt SLOs. You’ve got symptoms that are just springing up everywhere, nobody can keep track of them, and most of them aren’t useful anymore,” explained Charity. “At some point, you have to make the changes so no one gets alerted unless the customer is impacted. And that’s the real beauty of SLOs: It usually lets you cut the number of alerts by 90% or more.”
Charity went on to explain how being on call shouldn’t suck. “In fact, it should be like wearing a badge of honor because it recognizes that an individual is mature enough in their skillset and is trusted to make good decisions under pressure.” She offers more tips in this blog post, “Ask Miss O11y: I Don’t Want to be On Call Anymore. Am I a Monster?"
Fostering a healthy engineering culture
Charity and James also talked about how the right organizational culture fosters a positive environment for experimentation and risk—and, combined with the right tooling (aka observability), the teams that embrace both generally lead the pack when it comes to higher performance.
The two also dove a bit deeper into the topic of creating a culture of experimentation among your team. One of the biggest points they agreed on is that it’s not about making systems that never fail; it’s about making systems that fail without impacting customers while still being able to learn and test.
But perhaps most impactful was the advice that Charity and James shared from some of Honeycomb’s customers. Let’s recap some of them:
Ryan Katkov, Sr. Engineering Manager, Slack
“It's hard to have one piece of practical advice because a team of engineers is a living, breathing entity. But with that in mind, your first priority should be to lead by empathy. We are all humans and we have basic needs wherever we are. We want to feel like we belong, that we're improving, that our purpose is clear. Ultimately, it is up to the organizational culture to allow for flexibility and autonomy for engineering teams.
By building a supportive environment, encouraging team members to help each other, and dictating that it is always okay to ask questions—you are organically creating safety. Once you build safety, trust comes along, as well as loyalty and dependability. After those needs are met, you will start seeing an impact from the team that drives recognition for team members. Treat your people like humans, nurture their core needs.”
Matthew Zeier, Sr. Director, Production Engineering & Operations, Lacework
“The one thing that stands out to me as we're scaling and growing the team (in a pandemic that has shifted everyone to remote) is our monthly "[Optional] BeachOps Social." I think high-performing teams are filled with a sense of empathy for those in the team. I get the Netflix-we're-not-a-family thing but I'm spending 60% of my week with these people often in stressful situations.
Sarah Sherbondy. Principal Engineer, Platform Observability, Heroku
“The quality of the decisions an organization makes is largely dependent on the quality of the data they have available. One key source of data for us at Heroku is our telemetry data. It tells us how our customers are using our services, what their experiences are like the health of our systems, and more.
So my advice to engineering leaders is to invest in improving the quality of their telemetry data. They can achieve this by adding telemetry where it is missing and adopting instrumentation standards. Within Heroku’s telemetry libraries, we have strategically encoded many of our instrumentation standards. Now any engineering team in our organization can contribute to improving the quality of our telemetry by simply using our telemetry libraries in their services. This is a win-win for us because it enables all of engineering to participate in the improvement of our telemetry data as well as the quality of the decisions we make.”
Mohamed Hazem, Site Reliability Engineering Manager, HelloFresh
“If you're part of a team that is tasked with providing observability/reliability as a service (aka DevOps, Site Reliability Engineering (SRE), Production/Platform Engineering teams) to other teams in your organization, lowering the barrier to entry for these teams on these topics is the single most efficient way to achieve your goals—especially if you are a small team. In the past, we experimented with embedding SREs in product teams, and it turns out it is a much slower approach than what I just mentioned.
Also, stop measuring developer productivity as a function of how much developers can deliver but as a function of developer happiness. I don't remember who said that but it definitely wasn't me. But it resonates a lot with me and my team.”
Rich Anakor, Chief Solutions Architect, Vanguard
“I’ve spent most of my career in large, highly regulated enterprises where building resilient systems is often prioritized over engineering velocity. That shouldn’t always be the case. In our teams today, we are succeeding at both by doubling down on a foundational effort—observability.
Instrumenting our CI/CD pipelines has allowed us to identify bottlenecks and improve our build speed. Adopting distributed tracing using OpenTelemetry and Honeycomb backend has given our engineers the ability to debug faster, understand their systems better, measure the performance of their systems more accurately, and build more resilient systems by knowing what “good” looks like. Essentially, before jumping on any big improvement initiative, answer this question first: How much of your system state do you know and how quickly?”
Emily Nakashima, VP Engineering, Honeycomb
“My number one tip for 2022 would be to consider incorporating observability into all of your team’s development tooling rather than just thinking about it as tooling for production. Our overall goal with observability is to create the best experience possible for our users—and optimizing our build pipeline, our code reviews, and our on-call processes are as important to that as understanding request paths or system metrics.”
If you missed this webinar, fear not—we will have plenty more to come in 2022. We’d love to hear from you about what engineering goals you have for yourself/your team in the new year. Tweet at us or send us a message on our community Pollinators Slack and let us know. Want to watch the full webinar in the meantime? It’s now available on-demand.
Happy New Year everyone!
So many of the best and most promising managers I know have left management roles for senior IC roles since 2018, and as someone who...
“Why Are My Tests So Slow?” A List of Likely Suspects, Anti-Patterns, and Unresolved Personal Trauma
If you get CI/CD right, a lot of other critical functions, behaviors, and intuitions align to be comfortably successful and correct with minimal effort. If...