What Does it Mean to
Observe and Debug in 'Hi-Res'?

 

+ Transcript:

Deirdre Mahon [VP Marketing|Honeycomb]:

Hello everybody and welcome to today’s webcast brought to you by Honeycomb. We will get started in a couple of minutes, but before we do just want to cover some housekeeping. Today’s webcast is “Observe Prod in Hi-Res” and this is about informing you of new product updates to Honeycomb. Our presenters, Peter and Alyson. We’ll introduce them shortly. We will run for approximately 30 minutes. If we run over it is because we hopefully are getting a lot of questions from you, the audience. Please put your questions in the console to the right. Towards the end of the presentation, as we’re wrapping up, I will moderate and ask the presenters those questions.

My name is Deirdre Mahon and I am head of marketing at Honeycomb, and we’ll be back in a couple of minutes. We’re just going to give time. Some folks are still joining. We’re going to give them about a minute to join, so I’ll be back, to you, momentarily. Thank you.

Welcome to Honeycomb’s webcast, “Observe Prod in Hi-Res.” This is all about informing you on the latest. As I said, we’re running about 30 minutes today. Our presenters, Peter Tuhtan… as well as Alyson van Hardenberg, a Honeycomb engineer, are our main presenters today. We will do a live demo so please stick around for that and we’ll take your questions at the end.

2:25

I am today’s moderator and my name is Deirdre Mahon. I run marketing at Honeycomb. The topics that we’re going to cover today and for those of you on the webcast what we provide for customers, we’ll share in a bit about that and help organizations and teams practice observability that allows them to ask any question of their production system. We will share with you what we mean by hi-res in prod, so observing and debugging, prod in hi-res. We’ll go into some details on that and I will talk through the latest product updates.

We continually are adding new features and capabilities. We listen very closely to our customers as well as new, incoming customers and we have a very active community that is very vocal in sharing a lot of feedback, so we listened carefully to that. We want to share some of those new product updates visually. Two weeks ago we announced new changes to our Home to the … when you log in every day you have a new intuitive interface where you can actually drill in and experiment and find out and ask those questions on knowing what’s going on with your production environments. We’ll do a demo. Alyson will take us to a live demo and of course, take your questions. Let’s meet our speakers. Peter, Product Manager, why don’t you introduce yourself and share a little bit about your background and what you do at Honeycomb?

Peter Tuhtan [Product Manager|Honeycomb]:

Hey everybody, my name is Peter Tuhtan. I’m the Product Manager at Honeycomb. A little fact about me is that I joined this team when we were very tiny, about four or five of us working out of a condo, and before that, I actually used to be an elementary school teacher and started my career in tech and sales.

Deirdre Mahon:

Cool. Alyson is an engineer who was project lead on a lot of the new features we’re going to share with you today, why don’t you share a little bit about your background and your role at Honeycomb?

Alyson van Hardenberg [Product Engineer|Honeycomb]:

Hi everyone. I’m Alyson van Hardenberg. I joined Honeycomb about a year and a bit ago. I work in the full stack and occasionally on call, but my focus is in product development. Before joining Honeycomb I worked at a company called Apteligent and before that, I was actually a registered nurse.

Deirdre Mahon:

Very cool. You have very diverse backgrounds and I think you’re selling yourself short Peter, all the things you do, and leading the charge on defining “what’s next” in the product. Let’s dive in and before we share with you some of the updates to the product, let’s talk about observability and what that means and specifically hi-res prod.

Peter Tuhtan:

Sure. I’d say that one of the biggest differentiators about Honeycomb as a product is that we base everything around this idea of observability and there’s a ton of information if you’re interested in what that is, on our website. But the main gist of it is that we believe observability is the future of managing and maintaining your production systems.

The only way to do this efficiently is to have a much faster time-to-resolution by accessing the raw event data and being able to ask new questions of your production systems, where a lot of us depend on experiencing something personally, able to ask that question over and over again in the future.

At Honeycomb, we typically see teams using monitoring tools and log management before they become our customers and they serve a specific purpose, but do not help the teams find answers to problems that we’ve never encountered before. When teams face any type of scale or velocity from maybe shipping new codes or bringing on new customers themselves, those tools that they’ve been using sometimes can be inadequate when you’re trying to debug something in real-time or find an answer on call.

We acknowledged that monitoring tells you the overall health of the system can be useful and logs are also great as long as you know what you should be looking for. But it’s more of a search across those logs and not really querying and looking for specific pieces of information within them.

We use logs as a security blanket for a lot of teams, an archive if you will when you need them. It can get really expensive over time when you store all of that data. Honeycomb inversely provides you with a ton of rich visualization options and charts that allow you to interact, slice, and dice all of your data in real-time. Drilling down to find exactly where in the code a problem is occurring.

Deirdre mentioned hi-res. Let me move to our next slide here. There we go. Let’s just dive in a bit about what we mean about that. The gist of that is that we’re trying to see things as clearly as possible within Honeycomb, but we still like to start at a high level and we’ll see that today in the demo with one of our new feature releases called Home. Easily allow you to zoom in and out of an incident or a set of events that may have come into your system.

Switching from different views is great because you can take advantage of all the different visualization options in Honeycomb such as histograms, line graphs, tracing, or heat maps to highlight the issues as needed.

We believe that table stakes also have an underlying database that handles high cardinality data. High cardinality data for example will be something like a customer ID, a piece of data that has a ton of different variables that deal with it. Of course, those results need to be up to date as possible and accessible as fast as possible with the ability to run a range of queries, again, across a ton of different visualizations. You can be as proactive as possible when you’re releasing new things into prod and get ahead of those issues by testing as often as possible and instrumenting as well as we can.

Everyone on the team benefits because you can easily share the knowledge and queries can be perma-linked and shared in the future. And we’ll cover a lot of this today in the demo as I mentioned.

One thing to really highlight, too, at Honeycomb I think a lot of people, when they look at a tool like us, immediately think about debugging and, yes, that, of course, is kind of our bread and butter but we really see our three main use cases here as incident response, ongoing development, and optimization.

When teams become familiar with Honeycomb, they start to use it more regularly. Log in every day to use it just to see how your production systems are behaving in real-time. Super important when a new release has been shipped so you can proactively get ahead of all those problems that might occur.

9:22

Lastly when developing, validating on a local instance how new code is behaving before shipping is what we mean by shift left. Quite honestly, the more that your engineering team gets involved in the entire cycle and process, the more value we like to say you’re going to get when your instrument for your future self debug.

Let’s touch on a few of the new features that we’ve released to provide this hi-res view and that we’ll highlight in the demo. This one is one of our newest releases like I mentioned called Home. Alyson here was actually the engineering lead on it. We’re really proud of this one. Just to give folks a little bit of background as you see it in the demo, it’s really based off of the standard RED or rate errors and duration that you’ll see in other APM-like products. The idea here, of course, is to orient users to their service or dataset from a very high level to see the most top-level indications that, “Hey, something might be off.” Then on top of that Home really quickly allows you to then use the different features of Honeycomb to dive in and assess that problem.

It’s extremely valuable as well …

Deirdre Mahon:

Sorry, Peter Tuhtan. I was …

Peter Tuhtan:

… for teams that are trying to bring on new users. Go ahead, Deirdre.

Deirdre Mahon:

I was just going to say it would be wonderful to share with our audience because part of observability is the ability to interact with your service or your product and as the product manager and Alyson van Hardenberg as a project lead, can you share some personal stories on this particular project? What was the impetus? Why redesigning this Home environment and what kind of feedback did you hear from customers? I think it would be useful to share.

Peter Tuhtan:

Sure. I’ll share one of my most important ones first, which is that historically when folks were diving into a tool like Honeycomb or other tools, the first thing you need to do is start asking a question, and it’s not always apparent what the right path to start on are or where that question should take place and what service. With Home, as I mentioned, because we’re providing these default high-level signals of total requests, the error rates, and latency or duration that things are taken to come through that system, you can quickly pick up on that, dive in, and see how we formed that query for you and start iterating on it to drill into what the problem actually might be.

Deirdre Mahon:

Great. Alyson, do you have any learnings through this process of listening to customer feedback?

Alyson van Hardenberg:

Yeah. Before we even started development on this project, Peter Tuhtan and our designer went and did some customer interviews, which was great. They did some requirements gathering and then they brought the design to the engineering team and we worked through it together. We circled back with those users and they looked at the design and said, “Yes, this is going to solve all of our problems,” which was great.

Then once we got a working Home page going, we invited those same users to an early Beta, and then we were able to find the errors they were experiencing using Honeycomb itself, using this new Home to dive deeper into that data and see what our customers were doing. We were able to ship some quick fixes and then open it up to our larger Beta test group and still continue that iterative process of using this product to dog food it ourselves. Then we were able to GA in relatively short order and have had great feedback from our customers.

Deirdre Mahon:

Great. Thanks for sharing that. I think it’s useful. One of the things that we’ll talk about at the end is we have our new observability maturity model framework and white paper. Listening to users and paying attention to how they use the product and the app is actually part of observability driven development. I think it’s worth lingering on the back point for a minute. Let’s share the next update view.

Peter Tuhtan:

Sure. This is a feature that I think the whole team and most of our customers are really, really happy about. I’m personally really proud of it. It’s called Bubble Up and we’ll definitely be highlighting this today in the demo. It shipped a few months ago after testing with a large group of customers and the feedback was tremendous. 

Essentially how BubbleUp works is it allows you to select the outliers in a pattern of data in a heat map which shows you the density and distribution of the events as they’re coming in. Things that are sticking out here, as you can see in the image, you’re able to select and immediately see the outliers that are involved in those specific events compared to the baseline, which can send a very, very big signal of, “Hey, something is wrong,” and this is the actual field or event where that took place and where the error is occurring.

Again, this allows you to way, way, way faster dive in and solve a problem and get to a root cause rather than again maybe grappling through a log or only seeing the high-level signal from an APM tool.

The other core feature of Honeycomb that we’ll be visiting today, of course, is Tracing. We really invested a lot of time recently in improving our Tracing experience. Customers have been asking for a lot of the updates we’ll see today for a real advantage of being able to navigate through trace, filter for specific things, search across it, especially when you have a trace with a ton of spans.

We also, as you’ll see, have a lot of different ways to access the trace extremely quickly rather than in maybe other people’s experiences, having just switched between different tools to specifically find a trace and then go back to maybe a logline somewhere else. We’ll see this in the demo today. Again, tracing is just another way of viewing the data inside of Honeycomb.

The last feature that is constantly being improved on and that we’ll definitely touch on in the demo is Collaboration. This I think is very unique to Honeycomb. You’re able to use not only your past self, the queries and history you’ve created, but you can see a live view in Honeycomb of what your team is doing and how they’re working. Collaboration is critical to keeping your systems running as smoothly as possible and solving problems when there’s a fire.

15:58

One of the keys here you’ll see is in the live Activity View if you know who’s on call, you’re able to dive in immediately and see what questions are they asking? Where’s the data set that they’re looking at? Where’s the problem rooted? And jump in and help. Then, long term we hope to improve this to a point where you’re able to almost find any answer you need by just looking at the history and seeing how other folks have leveraged the tool across your team. Great. Let’s actually take a closer look at all of these features and dive into a demo with Alyson here.

Alyson van Hardenberg:

Let me share my screen with you and you can all see what we’re looking at. Peter is going to set the stage a little bit for us.

Peter Tuhtan:

We’re just going to go ahead and pretend here that we are an online ticket or events concert retail agency. I’m selling out for events and Alyson here is on call and loving it, but has recently been paged and awoken that there are some complaints coming from users and she’s going to dive into Honeycomb and take a stab at trying to figure out what’s going on.

Alyson van Hardenberg:

Alright, so here I am on call. I have loaded up Honeycomb and it’s landing me on this logging in a dataset, which is where I was investigating earlier. But the support tickets I’m hearing are actually coming from our API. Let’s switch over to the API and see what’s going on there.

Let me describe a bit about the charts that I’m seeing. This one on the top left here is the total requests that Honeycomb is seeing from our API dataset. I don’t see anything particularly standing out here. My error rates all look fine. Over here, the support messages I was seeing were from earlier today. I see a little bit of a spike in the duration of the requests and I’m looking at the status code of these requests. I can see here that there were some 500s. I can see it around that same time there was a peak in the duration. I can see a little bump in the 500s in that top left graph.

Let’s see if it was any particular user who was experiencing this issue. Yes, I can see actually this user 20109, I’ll have to look them up in a different way later, but I can see they were having some high latency, which I can tell from these little teal bar graphs compared to the other users, they were having less latency. When I hover over this row in my table, I can see the same data highlighted in the graph and that correlates with what I’m seeing up there.

Let’s go ahead and BubbleUp on this data. BubbleUp is the feature that Peter was talking about earlier that lets you select over an area of a heat map to break down on that information and see the raw events. Backing up a little bit, a heat map … all the little colored squares we see here on heat maps are events themselves. When the color is a darker shade of blue, those are more concentrated events, so the events that have the same duration and the lighter teals are more singular events. You can see the legend over here on the right. The lighter teal is a single event of that duration.

I’m really curious about this little peak right here. This is around when we got those support tickets. I’m going to select this area here and we can see down below the BubbleUp tool has run a query on our data and it’s showing us a breakdown of it. The yellow suggests that the area I’ve selected in the graph and the blue is the rest of the data that we’re looking at. I can see here the yellow, this endpoint shape of ticket/export, which our API is really having a really strong indicator in the selected area.

I look at the name. It’s that same endpoint. See the user Id. I can see there’s a big correlation between that selected area and the rest of the data and it’s that same value that we saw on Home. If I look at the status code 500 is also in that selected area, so I’m seeing a big trend here of this ticket/export 500-ing for this one particular user, 20109.

Let’s go look at a trace of that data. I can get to the tracing straight from this BubbleUp heat map by clicking on one of the events that I’m curious about. I jump straight to the tracing page and I can see here that this is that endpoint we were looking at and this span right here, fetch tickets for export, is taking quite a long time.

I just clicked on it and I can see in the heat map it’s selecting over here on the right, that it does indeed have a very high duration compared to the rest of my span, the rest of my events. It looks like it is really, really hitting our database for some reason.

21:12

What I’m going to do is jump back to our query and I’m going to send a message to Peter. Let’s say he’s on-call or he’s the engineer working on this project. I’m going to say, “Hey Peter, this endpoint looks like it’s erroring.” I can just send that message straight to Slack from here and he’ll be able to click in and see the same query that I’m looking at and we can start looking at our code and get that problem solved for our users. That’s about it. That’s how I would address this issue.

I’ve just clicked back to Home here so I can look at a few different ways that I approach the same problem. Peter talked about Collaboration and I can see the recent activity of my peers over here. Looking back two days ago Christine, I see, ran 10 queries. If I were to click on that, it would drop me into the query history. I’m wondering if she was looking at a similar issue. Let’s see. Looking at build ID status code into prod. Let’s see.

I’m seeing some same peaks in our API calls data set. That might be similar. I could use the same, looking at her history, to figure out what she was looking at, and maybe it’s similar to what I’m doing. I think she might’ve been along the same path because I can see status code equals 500, breakdown by user Id. I see the same user. They’ve been having a lot of problems lately. You might want to reach out to them with some more specific customer support.

That’s another way that I could look … start diving into this problem. Another way I’m going to show you, the last one, is I can look at my recent boards. Recent boards are a way for people to save queries for them and their team to come back and look at them in the future. I want to look at our API Service Board, see if there’s any collection of queries here that might help me in this.

I can see this one over here is one user’s slow experience. That might be the case that we’re looking at right now, so I clicked into that query. I can see that same status code 500 breaking down by user ID, a heat map. They did a heat map of my SQL duration. That’s an interesting idea. I might consider that for later and then I could BubbleUp over this area here. That’s about it for the various ways that I’d approach this problem. I’m going to circle back to Peter here.

Deirdre Mahon:

I was just going to say, Alyson, a very cool … the feedback we got from customers on the collaboration is if you’re a new person to the project or the environment and you’re not as familiar and you’re remote and you happened to be on call, then just speeding up the time to resolution because you can actually access or get answers in different ways and tapping into the knowledge of your more experienced team members is really valuable. That was a great show. Thank you.

Peter Tuhtan:

Yeah, and right on the heels of that, something we’re super proud of is some customer feedback we’ve received recently on Home. Specifically, Aly here let us know that a customer’s team is already leveraging it and is able to just dive in extremely granularly.  And the quote that the customer provided us with almost matched the PRD that I drafted up and the team that I worked on to outline the goals of the project of releasing Home. One of the best parts is providing those upfront breakdown tabs, especially the ones around users and high cardinality deals that in other products may be really hard to actually leverage.

The other piece of feedback we’re getting from our customers recently, I’ll just highlight here, that we’re super proud about, Geckoboard (one of our customers), has a case study that we highly recommend people dive into as well as Carwow’s case study. Both available on our website and I think we’ll drop the link here in the chat for folks to go check those out.

Deirdre Mahon:

Moving on, if you have questions in the audience, please put them in the comments and we’ll take those toward the end. We do have some other assets and resources to highlight. We have a series called Honeycomb Learn webcast, also on BrightTalk. The last two dive into much more detail on BubbleUp to help you quickly spot the outliers in your production environment. That was Danyel Fisher, who is a design engineer who actually was instrumental in working on that feature in the product. I encourage you to take half an hour out of your day and listen to that one.

Also, our tracing capabilities, episode three, goes into much more detail there and you also see a live demo of the products. This middle asset here is just fresh off the press. Our co-founder, CTO Charity Majors as well as Liz Fong-Jones authored this piece, this is our developer advocate and it’s a framework for observability maturity. It is more of a white paper and best practices guidelines on how to achieve observability driven development. I encourage you to read that. We’ve gotten really good feedback on that so far.

26:52

Then when Peter Tuhtan covered at the beginning of today’s session our coexistence with other types of tools, monitoring, and log management, we do very well. A lot of our customers have already been using a variety, whether it’s open-source or vendor tools to monitor the health of the system overall or keep logs for compliance and audit purposes. We are, as we shared, the use cases are very distinct when you’re trying to use Honeycomb, we can coexist with those environments if you have them deployed already. I encourage you to check out those assets. Then let’s do some questions from the audience before we wrap up.

You can trial Honeycomb yourself. We have a free 14-day trial. If you’re not quite ready to do that, we encourage you to access Play. There are links here, Play is, we have a dataset and you can walk through with our data, an interactive exploration of all the features, some of them at least that we showed you today. We have yet to update that Play environment for our new Home redesigned UI, but you will get a few as far as tracing and query and all of the charts and visualization. I encourage you to do that as well if you’re interested. Let’s take some questions. I have one question. How do you, and I get this to be Peter or Alyson, how do you get data injected into the Honeycomb application?

Peter Tuhtan:

The short of it is there are actually a number of ways. If you go into our website and find our documentation, we of course, offer what probably folks see as a standard opportunity to utilize some SDKs and get your hands dirty with your own code and instrument and send it to us. But we also have these integrations called beelines that are language environment-specific. Highly suggest you take a browse through those and see one that might fit your existing setup. Otherwise, you’re also always welcome to write in with the feature request for integration.

Deirdre Mahon:

I have another question. Thanks for that Peter. “How long does it take to onboard if you’re a new customer?”

Peter Tuhtan:

There’s a couple of options for that as well. I think it also depends on where you’re coming from. Typically we see folks onboarding and running queries anywhere in 15 minutes to less than a day. But we also pride ourselves on a very strong customer support and sales team who is able to hold the hands and really assist any person who’s interested in learning more about Honeycomb to get you up and running as fast as you need.

Deirdre Mahon:

Cool. I think those are the only questions we have today. Do you have any parting words, Peter or Alyson for our audience today?

Peter Tuhtan:

I would say especially with this last slide to definitely take a look at the 14-day trial and Play here. For those of you who might think, “14 days might not be enough. I have a hectic schedule.” Please don’t worry about that. After 14 days we actually offer a free community edition so you can keep on trialing and testing with your own data and the decision to maybe purchase a more advanced package or plan down the road.

Alyson van Hardenberg:

For me, I’d love to suggest that people try out Play. It’s not just some made-up data. It’s actually data that came from one of our production incidents so you can try to see if you can figure out what was happening with Honeycomb during that time, which I think is really fun.

Deirdre Mahon:

Very good. I see I did have one more question coming, Peter if you want to answer this or I can also help answer. “From a pricing perspective, how do you price Honeycomb?”

Peter Tuhtan:

Sure. We price based off of the storage that you’re looking to maintain with us or, in other words, how retention, how much time you’re looking to maintain inside of Honeycomb and your ingest, how much volume are you sending us on a monthly basis? There’s some information on the website around this, too, and if you would like additional information or you have specific figures in your head that you already would like to run up against us, please feel free to message us on our chat, which is available in the UI or write to us at sales@honeycomb.io  

Deirdre Mahon:

Dot IO. Oh, maybe we lost Peter. I just finished the sentence there for him. Yeah, sales@honeycomb.io. The community edition of Honeycomb is all the same features. It’s just been limited at storage. You can only do so much with the free trial compared to the standard edition.

I want to say thank you everybody for your time today and thank you Alyson for a great demo walkthrough. Hopefully, we just piqued your interest in where you want to reach out and start playing with it yourself. Thanks again Peter, for walking us through what observability driven development is all about and seeing prod in hi-res. Thank you, everyone and we’ll talk to you soon, hopefully. Bye.

If you see any typos in this text or have any questions, reach out to marketing@honeycomb.io.