Search results

74 Search results for "service level objectives"
74 Search results for "service level objectives"


RESOURCE

Ep. #52, Service Level Objectives with Alex Hidalgo of Nobl9

In episode 52 of o11ycast, Charity and Jess speak with Alex Hidalgo of Nobl9. Alex shares his formative experiences advocating for reliability, insights on utilizing error budgets, and the attributes needed to leverage senior-level influence within a socio-technical environment.

RESOURCE

Honeycomb Service Level Objectives (SLOs)

In this three minute video, you’ll see how Honeycomb’s actionable SLOs can help you get to the source of an issue faster. Using a real production SLO (latency per-event) as an example, we walk you through what exhaustion time alerts are and how to configure them, as well as how to use a heatmap to investigate and take action when things happen. 

RESOURCE

Debuggable Service Level Objectives

Honeycomb’s Service Level Objectives (SLOs) offer more actionable alerts with less noise. They’re also integrated right into your debugging workflows.

BLOG

SumUp Uses Honeycomb to Improve Service Quality and Strengthen Customer Loyalty

Growing pains can be a natural consequence of meteoric success. We were reminded of that in our recent panel discussion with SumUp’s observability engineering lead, Blake Irvin, and senior software engineer Matouš Dzivjak. They shared how SumUp’s rapid growth spurt compelled them to change their resolution process—both logistically and culturally—to ensure a service level quality that reflects their customer obsession.

BLOG

Exploring AWS Costs Beyond the Service Level

This post will talk about using a derived column to directly connect individual customer experiences to the cost of providing that service with AWS Lambda. By leveraging these tools, we can better understand when our product is used in costly ways, and also provide tooling to better analyze and understand the cost effects of configuration changes.

BLOG

Honeycomb Supports Service Ownership

The software industry is moving toward teams that own the services they build. This concept encloses principles and possibilities from movements toward microservices, DevOps, Agile, and Project to Product. In these paradigms, a team of people delivers software that provides valued capabilities. These capabilities help customers get their work done, support business operations, or enable other software to do these. Writing code is only part of this; capabilities only work if the software is running in production. Service-ownership teams carry this responsibility. To own production, a team needs visibility into production. Honeycomb recognizes service ownership and supports it.

BLOG

SRE + Honeycomb: Observability for Service Reliability

As a Customer Advocate, I talk to a lot of prospective Honeycomb users who want to understand how observability fits into their existing Site Reliability Engineering (SRE) practice. While I have a passing familiarity with the discipline, I wanted to learn more about what SREs do in their day-to-day work so that I’d be better able to help them determine if Honeycomb is a good fit for their needs.

BLOG

The Case for SLOs

With one key practice, it’s possible to help your engineers sleep more, reduce friction between engineering and management, and simplify your monitoring to save money. No, really. We’re here to make the case that setting service level objectives (SLOs) is the game changer your team has been looking for.

BLOG

Authors’ Cut—Gear up! Exploring the Broader Observability Ecosystem of Cloud-Native, DevOps, and SRE

You know that old adage about not seeing the forest for the trees? In our Authors’ Cut series, we’ve been looking at the trees that make up the observability forest—among them, CI/CD pipelines, Service Level Objectives, and the Core Analysis Loop. Today, I’d like to step back and take a look at how observability fits into the broader technical and cultural shifts in technology: cloud-native, DevOps, and SRE.

BLOG

Authors’ Cut—Actionable SLOs Based on What Matters Most

SLOs—or Service Level Objectives—can be pretty powerful. They provide a safety net that helps teams identify and fix issues before they reach unacceptable levels and degrade the user experience.

But SLOs can also be intimidating. Here’s how a lot of teams feel about them: We know we want SLOs, we’re not sure how to really use them, and we don’t know how to debug SLO-based alerts.

RESOURCE

Conditional Distributed Tracing

Distributed tracing is generally a binary affair—it’s off or on. Either a trace is sampled or, according to a flag, it’s not. Span placement is also assumed to be an “always-on” system where spans are always added if the trace is active. For general availability and service level objectives, this is usually good enough. But when we encounter problems, we need more. In this talk, we’ll show you how to “turn up the dial” with detailed diagnostic spans and span events that are inserted using dynamic conditions.

BLOG

Authors’ Cut Spark Notes Edition: Jumpstart Your Observability Journey

George Miranda, Liz Fong-Jones, and Charity Majors, held a series of live discussions called the Authors’ Cut to bring core concepts of the book to life by applying them to real-world use cases. Now that the series is complete, we thought it would be helpful to combine all of the discussion recaps for your viewing pleasure. Each blog post below takes key concepts from chapters in the book and makes them more digestible. 

BLOG

Touching Grass With SLOs

One of the things that struck me upon joining Honeycomb was the seemingly laissez-faire approach we took towards internal SLOs. From my own research (beginning with the classic SRE book, following Google’s example), I came to these conclusions:

-SLOs are strict. They aren’t as binding as an SLA, but burning through your error budget is bad.
-SLOs/SLIs need to be documented somewhere, with a formal specification, and approved by stakeholders.
-SLOs should drive customer-level SLAs.
-Teams should be mandated to create a minimum number of SLOs for the services they own.

BLOG

New Honeycomb Features Raise the Bar for What Observability Should Do for You

As long as humans have written software, we’ve needed to understand why our expectations (the logic we thought we wrote) don’t match reality (the logic being executed). To that end, we developed techniques to help measure reality—logging text strings, or capturing aggregated metrics—and persevered, seeking out newer and fancier logging or monitoring solutions over the intervening decades.

BLOG

Top Takeaways from Monitorama 2022

Two of our folks went to Monitorama 2022, and they gleaned a few pearls of wisdom they’d love to share with you, including an unexpected, but surprisingly insightful talk on carbon impact reporting. Read more now.

BLOG

Honeycomb Pro: Now With Metrics & SLOs

Honeycomb Pro is about to get even better. Starting today, all Pro accounts have access to Honeycomb Metrics and two Service Level Objectives (SLOs), previously only available to Enterprise accounts. Full disclosure: Later this…

BLOG

ICYMI: Honeycomb Developer Week: The Partner Ecosystem

We know that you value collaboration. That’s why we share incident reviews and learnings—because we believe the entire community benefits by working together transparently.  In the spirit of working better together, we invited ecosystem…

BLOG

Scaling Kafka at Honeycomb

When you send telemetry into Honeycomb, our infrastructure needs to buffer your data before processing it in our “retriever” columnar storage database. For the entirety of Honeycomb’s existence, we have used Apache Kafka to…

BLOG

Shipping on a Spent Error Budget

Modern software services are expected to be highly available, and running a service with minimal interruptions requires a certain amount of reliability-focused engineering work. At the same time, teams also need to build new…

BLOG

Data Availability Isn’t Observability

But it’s better than nothing… Most of the industry is racing to adopt better observability practices, and they’re discovering lots of power in being able to see and measure what their systems are doing….

BLOG

One Year of Graviton2 at Honeycomb

A year ago, we wrote about our experiences as early adopters of Graviton2, and how we were able to see 30% price-performance improvements on one dogfood workload from switching to the arm64 architecture. In…

BLOG

Honeycomb’s 2020 Blog Roundup

We’re here at last: the final days of 2020. Let’s take a look back at this year’s most popular Honeycomb blog posts. Observability 101 In Observability 101: Terminology and Concepts, Shelby Spees reflects on…

BLOG

Setting Business Goals with SLOs

‘Tis the season to set 2021 goals. Whether setting OKRs, KPIs, KPAs, MBOs, or any other flavor of goal-setting frameworks in an endless sea of acronym soup, chances are that you’re still dealing with…

BLOG

Outreach Engages Their Production Code with Honeycomb

Outreach is the number one sales engagement platform with the largest customer base and industry-leading usage. Outreach helps companies dramatically increase productivity and drive smarter, more insightful engagement with their customers. Outreach is a…

BLOG

Incident Review: Meta-Review, August 2020

Every once in a while, teams or systems hit an inflection point where enough things change at once and the pattern of incidents shifts. We found ourselves at an inflection point like that last week.

BLOG

Spread the Love: Appreciating Our Pollinators Community

Have you heard the buzz about observability with Honeycomb 🐝? It’s the best tool on the market for observing your systems in real time to reduce toil and delight users. But don’t listen to us, listen to our kickass community of “Pollinators”–this blog post is dedicated to them 💖

BLOG

Bees Working Together: How ecobee’s Engineers Adopted Honeycomb

At ecobee, adopting Honeycomb started as a grassroots effort. Engineers signed up for the free tier and quickly started sharing insights with teammates. When it came time for ecobee to make the “build vs. buy” decision for observability tooling, sticking with Honeycomb was the clear choice.

BLOG

Challenges with Implementing SLOs

A few months ago, Honeycomb released our SLO — Service Level Objective — feature to the world. We’ve written before about how to use it and some of the use scenarios. Today, I’d like…

BLOG

How We Manage Incident Response at Honeycomb

When I joined Honeycomb two years ago, we were entering a phase of growth where we could no longer expect to have the time to prevent or fix all issues before things got bad. All the early parts of the system needed to scale, but we would not have the bandwidth to tackle some of them graciously. We’d have to choose some fires to fight, and some to let burn.

BLOG

Surface and Confirm Buggy Patterns in Your Logs Without Slow Search

Incidents happen. What matters is how they’re handled. Most organizations have a strategy in place that starts with log searches—and logs/log searching are great, but log searching is also incredibly time consuming. Today, the goal is to get safer software out the door faster, and that means issues need to be discovered and resolved in the most efficient way possible.

BLOG

Honeycomb, Meet Terraform

The best mechanism to combat proliferation of uncontrolled resources is to use Infrastructure as Code (IaC) to create a common set of things that everyone can get comfortable using and referencing. This doesn’t block the ability to create ad hoc resources when desired—it’s about setting baselines that are available when people want answers to questions they’ve asked in the past.

BLOG

Incident Review: Working as Designed, But Still Failing

A few weeks ago, we had a couple of incidents that ended up impacting query performance and alerting via triggers and SLOs. These incidents were notable because of how challenging their investigation turned out to be. In this review, we’ll go over interesting patterns associated with growth, and complex systems—and how these patterns challenged our operations.

BLOG

On Counting Alerts

A while ago, I wrote about how we track on-call health, and I heard from various people about how “expecting to be woken up” can be extremely unhealthy, or how tracking the number of disruptions would actually be useful. I took that feedback to heart and wanted to address the issues they raised, and also provide some numbers that explain the position I took with these metrics.

BLOG

Tracking On-Call Health

If you have an on-call rotation, you want it to be a healthy one. But this is sort of hard to measure because it has very abstract qualities to it. For example,  are you…

BLOG

On the Brittleness of Dashboards

Dashboards are one of the most basic and popular tools software engineers use to operate their systems. In this post, I’ll make the argument that their use is unfortunately too widespread, and that the…

BLOG

Ask Miss O11y: Long-Running Requests

Dear Miss O11y, How do I think about instrumenting and setting service-level objectives (SLOs) on streaming RPC workloads with long-lived connections? We won’t necessarily have a “success” metric per stream to make a percentage…

RESOURCE

The Five Characteristics of a Good SLO

This guide covers the basics of SLOs, why their use is preferred as a leading-edge practice in observable systems, and how to ensure your SLOs are set up effectively.

BLOG

ICYMI: Honeycomb Developer Week Wrap-Up

Getting started with observability can be time consuming. It takes time to configure your apps and practice to change the way you approach troubleshooting. So it can be hard to prioritize investing time, especially…

BLOG

Ask Miss O11y: Load Testing With Fidelity

Dear Miss O11y, My developers and I can’t agree about what the right approach is for running load tests in production. Should we even be running load tests against our production infrastructure or is…

RESOURCE

Vanguard’s Adoption Journey: How Honeycomb Helps Shape Developer Workflows

After evaluating multiple approaches to distributed tracing, Vanguard ultimately landed on using OpenTelemetry and Honeycomb. Now, they have hundreds of teams using Honeycomb, with a different mentality to the way they run and manage production. One example is a team using SLOs for a critical service. A burn alert came through, and they were able to remediate this issue before it became customer-impacting.

BLOG

The State of Observability in 2021

Observability adoption has increased as more companies seek to understand how their applications behave in production and quickly identify and resolve problems. Our second annual observability maturity report is the first that shows a…

RESOURCE

Achieving Production Excellence at Scale-Thanks

Whether you’re a startup building new services from scratch or in a brownfield enterprise environment, this webinar offers expert advice on how to get started and how to measure the ROI of implementing modern software practices like progressive delivery, observability, and service-level objectives (SLOs).

BLOG

Refine Your Observability Experience at Scale

Today, we announced that Refinery is now generally available. With Refinery, it’s now easy to highlight the critical debugging data you need and to stop paying for the rest. Refinery is a sampling solution…

BLOG

Sweetening Your Honey

Are you looking for a better way to troubleshoot, debug, and really see and understand what weird behavior is happening in production? Service-level objectives (SLOs) and observability can help you do all that—but they…

RESOURCE

Identifying Hidden Dependencies

Learn how Honeycomb improved the reliability of our Zookeeper, Kafka, and stateful storage systems through terminating nodes on purpose.

RESOURCE

SLOs: Uniting Engineering and Business Teams Behind Common Goals

A 451 Research | Business Impact Brief:


If an app/service performs poorly, how likely are you to switch to a different brand? Turns out 79% claim very or somewhat likely. SLOs are now a best practice approach to help engineers and business stakeholders understand what to measure about their service for a consistent quality customer experience.

RESOURCE

SLO Theory: Why the Business Needs SLOs

Now, engineering and business speak the same language. Find out why you should care, how SLOs are critical to SRE practice, and how to keep your customers happy.

BLOG

Honeycomb Launch!

Many many of you have been asking when we’ll be “launched”, in “production”, taking “money”, or “GA”. Well, here you go! 🙂 A big THANKS to all our early users, our first paying customers,…