Webinars Observability Debugging
De-stress Debugging Triggers, Feature Flags & Fast Query
Summary:
Second episode in our Honeycomb Learn series looks at how to cut stress levels when debugging issues in production. Starting with a hypotheses, run fast queries and then navigate to the code where the problem lies. Be proactive and set triggers to let you know if something needs attention. When engineering is about to ship a new release, set a feature flag to watch how production behaves in real-time. Curtail performance issues and reduce customer impact with the right tools to better understand production systems, right now. Listen to learn: - Quickly go from hypotheses to fast query and pinpoint exactly where the issue is - How to set triggers based on thresholds important to your business - Set feature flags to control specific parts of your environment to reduce stress levels. See a Honeycomb demo, ask questions and learn more.
Transcript
Peter Tuhtan [Product Manager|Honeycomb]:
Hey everyone this is Peter Tuhtan with Honeycomb, joined by Michael Wilde, also Honeycomb. I hope everyone can hear us well. Feel free to drop feedback into the feedback from the audience or questions into the audience.
Weāre going to be starting in about two minutes, so go ahead and grab your breakfast, lunch, or dinner; whatever time it is in the area youāre joining us from, and weāll get started soon.
Also just an early heads up, at the end of the presentation we will definitely be keeping some time for questions, so if you would rather save your questions until the final slide, feel free and you can drop them into the chat and weāll discuss amongst ourselves.
All right, welcome everybody. Letās get started. Thank you all for joining the webinar today. So, before we dive into the presentation, Iād like to go over a few housekeeping items. Like I said earlier, if you have any questions during the webinar, please use the āask questionsā tab located below the player. Your questions will be addressed either during or at the end of the session. Probably at the end of the session just so we can keep the flow moving, but if I see something that really prompts us to dive into it right then, Michael and I can jump into that. At the end of the webinar also please take a moment to rate the presentation and provide some feedback using the ārate thisā tab below the player.
2:43
All right, letās get started. Welcome to the Honeycomb Learn webcast series. Itās designed to educate teams that work in DevOps about what observability is, and how to get started. Observability-driven development is the ability to ask any question about your production system so you better understand debug when incidents occur. Teams should code confidently, learn continuously, and spend more time innovating.
So, one thing to chime in on here too before we really dive into the material here, a lot of this is driven to the fact that Honeycomb has spent a lot of time researching where dev time is being spent, and a lot of that obviously is spent debugging and fixing technical debt. Super frustrating, time-consuming, and obviously impacts the quality of life, quality for your time. Expensive for companies, because customers do not have an optimal experience, and ultimately competition can get ahead, which impacts your revenue.
Iād also like to call attention that this is, again the second in our webinar series. Go ahead and head to our website if youād like to look for episode one and catch up if you missed it.
Today youāre joined by myself, Peter Tuhtan; Iām a product manager here at Honeycomb. I joined the team when we were working out of a condo with about four people, back in the day, as head of sales, and have transitioned to product management since. Iām joined by Michael Wilde, who I will let introduce himself
Michael Wilde [Dir of Sales Engineering|Honeycomb]:
Yeah, greetings. I run the Sales Engineering organization here at Honeycomb. I came to Honeycomb last July after a twelve-year stint at Splunk. So Iāve seen the world of machine data and production debugging evolve, and Iām super excited to see what customers have done with Honeycomb. Hopefully, we can show you that today.
Peter Tuhtan:
Yeah, thanks, Michael. On that note, we will be going through some topics today including what you see now. So what kind of debugging exists in the eyes of Honeycomb and observability, how do you get started facing that, how do you get ahead of those problems, and what are the right tools for those jobs, in our opinions?
Weāll probably be going through about 45 minutes to 30 minutes today, but as I said, weāll keep a lot of time at the end for questions as they come up. Michael Wilde will also be jumping into a screen share and going through some demoing of the Honeycomb product. Letās get started.
So, one thing that we definitely preach here at Honeycomb is that debugging only gets harder, and itās harder than ever right now. A problem that exists is that thereās just a myriad of different tools that are used across an entire company by different teams and the members of those teams. That can cause a lack of visibility of whatās actually going on. At the same time, which Iām sure many of you obviously understand, systems are becoming only more complex through distributed systems and microservices. The challenges are that youāre not using debugging tools designed for these new architected systems.
Metrics lack details and give you the direction that something is different, but needs further investigation. Logs meanwhile can be searched, but at times can be difficult to actually query unless youāre using another service, and they donāt provide any ability to actually get tracing in unless, again, youāre using another service. And then again, APM tools donāt give you access to all the raw data with different visualization features, and we tend to believe through some research that tracing can be an afterthought in APM, and still very new to that space.
By the way, if youāre interested in seeing how Honeycomb stacks up against these three kinds of, what we think are older ways of going about debugging, we have a blog post. If you just search ācomparisonsā in our blog, you will be able to follow that.
For any of you that follow our fearless leader,Ā @mipsytipsy/Charity Majors on Twitter, this quote may not be new to you, but for the rest, she claims that āItās way easier to build a complex system than it is to run one or understand one.ā At Honeycomb, we believe observability is the only way forward. This means you have the ability to ask any question of your system that is necessary if youāre going to meet your SLOs. Think of it as production intelligence for modern DevOps teams. Just like BI was built years ago for business users, intelligence for systems devs ops SRE teams is what weāre trying to create here at Honeycomb. You must have a unified, single view into what is actually happening, especially if new code ships on a much more frequent basis.
Charity frequently speaks at events, by the way, such as Velocity. If youāre going, you should definitely try to check it out. Sheās got some sweet stickers, but her actual talks are really, really good, and have helped a lot of teams out in the past.
7:54
There are a number of best practice steps that teams must go through to reach a state of observability in our opinion. And as said previously in episode one, which I alluded to at the beginning of this call, we talked about instrumentation. So thereās your first very, very important step, and how to create better telemetry so you give context for the code which helps everyone in the DevOps team to maintain a well-performing service. I encourage everyone to check that first version of our webinar out and share it with other team members so you can follow the whole path to this webinar.
Today weāre going to focus on the ongoing review of your system, and show you how to run queries to better understand whatās going on, but also to use specific tools such as creating triggers so you can be alerted proactively. How to handle an incident is critical, so issues are resolved quickly, and impact to the customer is minimal.
In our opinion, there are really three areas of debugging that commonly exist. Obviously there are outliers and everyone has probably different lingo for all of this stuff, but as a DevOps oriented team, youāre faced with all sorts of different issues and things you need to focus on so that your service is totally operational and youāre satisfying SLOs, and youāre maintaining that happy customer base. Honeycomb refers to this as software ownership, and regardless of who on the team is responsible, it does impact a wide range of your entire org, from engineers to ops to SREs to customer success and sale. We see debugging, as I said, falls really into these three main buckets, or categories if you like, of activities.
And itās obviously not meant to be an exhaustive list of whatās out there when incidents occur, but starting with the left, we call this basically just āincident response,ā right? Major incidents. Your on-call person is receiving an alert, somethingās wrong, they need to jump into there and solve it. The second bucket though is something weāll focus on a little bit more today, as well as the third. The second being the problems and incidents caused by performance degradation. Maybe this isnāt exactly an on-call alert, but for any number of reasons such as capacity constraints or the opposite, an unhappy message from the head of finance about your AWS bill, this is something that you can use Honeycomb to keep a constant eye on and manage all the time.
And then the third on the right here is what we believe might be an area that actually gets the least amount of attention these days when one thinks about debugging, but we also think itās probably the most important. If youāre continually learning from the new system as youāre releasing new features and additions to your services, you can be proactive and get ahead of any issue. You can also work closely with your eng team across other teams, to know exactly when something hits production to understand how users are adopting a new feature. So, going way beyond debugging and leveraging a system like Honeycomb to see if things are successful with what youāre releasing. Thereās a tremendous amount of learning for the team overall in this area of ongoing development and release management.
So, Michael, weāre going to cover a variety of these topics today, but just to kind of give some folks the lay of the ground, maybe you have an example of an instance recently for folks to hear about that Honeycomb was used for.
Michael Wilde:
Yeah, thanks Peter Tuhtan. A recent customer of ours, a company called Behavior Interactive, makes some really awesome multiplayer games. Theyāre up in Canada. And they had some decent APM tools in the past, but they noticed there was some slowdown, something not behaving right, and they actually just couldnāt find the answer with their other tools. They were looking at, is it caching, is it the database, or is it somewhere else? And they actually felt that the speed at which they were able to get things done with Honeycomb was literally impossible in some of the other products. Theyāre now at the point where they recommend to anybody whoās running anything in production to think seriously about Honeycomb just because of the speed.
As Peter said, with Charityās help, complex systems are really easy to build, but theyāre very difficult to debug, and itās really exciting to see existing customers get through some of their issues quickly. Feel free to check the case study on Behavior, or also known as BHVR, out on our website at honeycomb.io.
12:28
Peter Tuhtan:
Awesome, thank you, Michael. So letās dive into the first one of these areas, the incident response. The one weāre all familiar with, right? Youāre using PagerDuty, what have you, youāre getting a ping, somethingās wrong. Thereās an incident and the on-call team is alerted, and this could be obviously one customer or many. The most important customer, or all of them, saying āhey service is slow, service is down.ā A lot of the time when this occurs for your team it can feel like a black box that youāre stepping into. You have no idea where to start. So what do you do? Right here we kind of list a best practice map from top to bottom of what we believe is the best approach to solving an unknown bug or problem.
The first is to understand the severity of the issue, and how many people itās actually affecting. Then, I donāt know, perhaps something that has happened previously is what you want to look for, or is there maybe a starting off point in the past youāve taken thatās the right route with the current tools you use? Usually, though, teams have some general hypotheses based on the description of the problem, but thatās not always the case. This is obviously one of the bread and butter situations that people leverage Honeycomb, so Iāll pass the microphone to Michael here again, and letās dive into a scenario using Honeycomb to address this.
Michael Wilde:
Yeah, thanks, Peter. Iāve got to show you what this thing is like because the speed that you will see me work in Honeycomb is pretty much unparalleled. But also, weāre going to experience a couple of them, you could call them team features, where we can observe what each of us is doing to help everyone become the best debugger.
Provided everyone can see my screen, this is just the homepage of Honeycomb when you log in. Iāll give you that scenario of trying to track down maybe something really difficult to find but pay attention to when Iām scrolling down here. I see the information that Iām doing, so my past history is here. Great places to start if Iām debugging problems frequently, but I also see things that other team members are doing, and Honeycomb is unique in the respect that we recognize that most problems are solved by folks on teams, and if we can observe what our other team members are doing, chances are, a new person could become smarter at your system, or the team can benefit from all the expertise that is on there.
So, imagine if Iām running an API service and thereās a report of something wrong, but all of my monitoring systems are kind of showing things are okay. On the Honeycomb screen here, on the right-hand side, weāll see my entire history, which is kind of like my browser history, but very visual, and I can retrieve it instantaneously throughout the entire life of my Honeycomb employment. On the top, thereās a set of gray boxes where I can start to do a query and itās really, really simple to use. But thereās a lot of power in the simplicity.
If I look at this chart, itās just a simple count chart over the last six hours. And we see a normal pattern of behavior. In my case, Iām running an API service so Iāve got the information, a little bit from the back end, some stuff from the front end, a few extra fields, and of course Iāve instrumented my code so I get some nice distributed tracing. To really crack this thing open, Iām going to quickly do a breakdown by status code, and Iām going to use a great visualization that we built called a heat map. And weāll do a heat map on latency. What thatās going to allow us to do is see a bit more about whatās going on inside that normal period of activity.
We see our purple line here, which shows HTTP 200. Those are successes. Statistically, there are so few failures that it probably wouldnāt even set off most monitoring systems. But if we look deep down inside, there are a few 500 errors that are happening, and if I scroll down here as well, I also see a table that shows my status code by count. It gives me a little information.
As you can see, the heat map also gives me ranges of behavior by color. Most of our latency, or duration, is much less than a second, which is where we like it. We do have this odd spike that is drawing attention to me, and we should probably see if we can investigate that. So what we did at Honeycomb, we also built a really great tool for developers and operators that allow us to drill deep down inside and almost x-ray whatās happening inside this weird little spike.
This tool is called BubbleUp. I havenāt seen anybody else have this yet, which is kind of cool because when I draw a box around that area that Iām interested in, now I get an instantaneous analysis of absolutely every field that is in my dataset, regardless of whether I broke down on it. And it helps me answer really the three big questions if Iām having an issue. Somebody reports a problem, I have to verify it, right? Just because Peter reports a problem, doesnāt mean thereās a problem. Second, where is it happening? And third, gosh I hope itās not happening to anyone.
So the bars in yellow represent the statistics around this field and its appearance in the selection. So we have 98% of the events from the ticketing export endpoint show up in this selection. Thatās kind of interesting. We also see, is there failure? Again, Peter reported a problem, is there really one? There actually is. So Iām going to take and filter by this status code field. Iām going to do a breakdown by this name field. Well, actually weāll use the endpoints field. Theyāre pretty much almost the same. And then, look at the user ID. So user ID is showing up. Itās obviously showing up in every single event, but one user is affected. That one person, that lonely user out there, that might be really important to us.
So letās break down by that field. And what Iāve basically done is constructed a query right here which is of high granularity. So Iām going to click run, and instantaneously Iāve almost pinpointed the source of the problem. If we look here down on the chart below, we can see user 20109 is getting what, HTTP 500 status code error, Omni ticketing export endpoint, and itās way more than everyone else. So thatās not good. But at least weāve found it. What could I do right now? Maybe I have my customer service group reach out to that person.
19:18
Michael Wilde:
But we should be able to take this a little bit further. When you do instrumentation, which we highly recommend in this new world of advanced apps and micro-services, distributed tracing becomes your go-to method of really finding out what behavior is actually doing. So if I click on the ātracesā button here, I get an idea of the slowest transactions that are happening in this time range, and weāre kind of hoping this has only happened in the one user, but it looks like there is another one there. So as an engineer I could drill in and look at any of these tracers, but I need to drill all the way down.
This is where you might see some differences between Honeycomb and other tools. This is often where most tools stop and Honeycomb really starts to shine really bright. So if I want to see life from the perspective of that request, I can go all the way down, get to that distributed trace. Most other tools stop here, at the original request, and as you can see on the right-hand side of the screen, we have every single field from the original raw event, so nothing is pre-aggregated. All sorts of extra context from platforms, the service name, to duration. And we can see the path that this request took. Hit an endpoint. Hit our rate limiter, that looks just fine. Did an authentication service, good thing that thatās working well on the back end.
And then for some reason, the ticket export was called. Maybe they were printing tickets for a concert so they could hand them out. Well, it looks like weāve got a high degree of latency here. In our world, 1.3 seconds is pretty long. I mean, think about staring at a webpage for almost two seconds, and youāre sometimes on to something else. So we can see the entire path that this request took with this waterfall chart. At each stage, we have enriched fields such as query, and the time the query took. And we can look at this and say, well maybe we could change the set of operations to do all the queries in parallel. Maybe you could do that, maybe you canāt. Sometimes weāre not lucky enough to be able to change the code that we have, so could we change the code that happened before what we have?
So right here we may say, āletās see if somebody hits our endpoint way more than frequently.ā Maybe we upgrade the code in the rate limiter so that that behavior doesnāt exploit a design that we might not be too happy about. And lastly, what youāll see on the right-hand side is my entire history. So it makes it easy for me if I end up at a dead-end, as we often do in debugging, to get back and also to learn from the activity that all of our other users will do. Lastly, I could take and make a trigger and that trigger could notify maybe a PagerDuty, a webhook, or some other mechanism. Iāll show you a little bit more about triggers but I just wanted to throw it back to Peter to move on.
Peter Tuhtan:
Thank you. Very helpful. And on that note, letās talk about the second area of debugging that we recognize here at Honeycomb. Obviously the first is the one that kind of chills our bones and wakes us up in the middle of the night, or you have a dev or engineer on your team whoās just not happy about being on-call knowing that they might run into that situation. But on the flip side of that, one of the ways that you can avoid all of that occurring is just being proactive. Michael just highlighted something, talked about triggers a little bit. Again, hereās a list of what we believe are some of the best practices in setting yourself up for success here.
So, first, we like to get a good understanding of the time frames for shipping code. Hopefully, all of us can rely on a consistent calendar, says the project manager who laughs internally, of the upcoming months and quarter of when things are going to be on time in GA, and staying on top of it and making sure weāre instrumenting ahead of time to prepare ourselves to monitor for any impacts on our service, or what weāre leveraging to run our service.
So we decide whatās important to watch. This might change over time depending on the nature of new code. We also make sure that everyone on the team is aware, especially of those on-call or in customer support, about when new code is being shipped. You donāt want those teams going back to that black box and searching through the dark. For some of our customers, this actually involves giving support a heads up, like I mentioned, to being extremely alert if thereās a customer that weāre really, really keeping a close eye on. So Michael, why donāt we talk a bit about this scenario more, looking out for just performance degradation, leveraging some of the tools in Honeycomb.
Michael Wilde:
Yeah, thanks, Peter. Most tools should be able to do some levels of proactive notification, right? The unique nature of Honeycomb allows us to drill in and dig deep into interesting scenarios that might bubble up, no pun intended, the kind of things we want to be alerted on. In our process of finding this user that is having an issue, I might make a trigger, maybe for example we add a p95. Letās add a 95th percentile of duration, run that query.
Itās so refreshing to have something that works so fast, and thatās one of the things that Behavior Interactive loved about Honeycomb. I can make a trigger. Maybe something simple like where the duration is greater than, I donāt know, maybe the duration of latency is, maybe itās greater than 800 milliseconds. Most other systems donāt recommend that you run these types of triggering systems very frequently. At Honeycomb, as you can see, itās so fast. Go ahead and run that thing every minute. And maybe we add a recipient, okay? Sure we could send an email, send something to Slack, PagerDuty, even your favorite webhook.
But additionally, you may have noticed some odd lines that showed up on my chart before. You can actually create a dataset marker in Honeycomb so that thereās something dropped on there for an operator to see; external context is really awesome. When triggers fire, they show up on a really nice page that allows you to see all the triggers that are happening, test them out to make sure they still work, because youāve got to make sure things are always working, and the typical idea is taking a look at things that are wrong.
At Honeycomb, we use Honeycomb to Honeycomb Honeycomb, kidding. Although any time you see a demo or talk to a vendor that has a tool that helps you debug, ask them how they use it on their own systems. It will be quite revealing. At Honeycomb, we try to live the values that we espouse, so weāve done a great deal of instrumentation on our own code, and in our production environment, we do lots of the things that you would normally expect. Triggering on things like errors, but also perhaps I look at whatās happening maybe on the front end. And Iām looking at activity by the user.
As a product manager, for example, Peter can observe what our customers are doing to kind of see well, are they getting the experience that theyāre expecting? Does that mean that somebody on call is then doing something about it? Maybe and maybe not, but the idea of software ownership is really about taking a look at software behavior, not just when itās broken, not just when thereās an on-call incident, but when things are actually working well, so you can see whether you have built what you expected folks to use.
27:19
Michael Wilde:
Now thereās, as I mentioned, this idea of software ownership. One of the upcoming technologies and methods that folks are using to really own software is this idea of a feature flag. If youāre not familiar with feature flags, a lot of you probably are, but itās like a way to turn on and turn off parts of your production system, parts of your code that are either hidden or disabled or enabled. Itās a great way to do things like testing in production. Itās a great way to have a beta program. Itās a great way, and we use this at Honeycomb, to help customers that weāre building things for, prior to release. We can turn on a feature flag, and Iāll show you what that looks like really, really soon.
If you look at the whole CI/CD pipeline, when a build occurs, maybe when a feature flag is deployed, and we use a great product called LaunchDarkly to do that, you might take a different approach. You might say, okay, when Iām deploying a new feature flag or Iām deploying an update, why donāt we use an API called the Honeycomb, maybe to generate a dashboard. Thereās a nice API for boards in Honeycomb. You could generate a dashboard that had four or five queries to remain to that particular feature flag. Maybe send a flag message to the developer of that flag so that they then can look at a dashboard thatās already ready for them. And when that deploy happens, a marker has been put on the timeline in the Honeycomb by way of an API call from a CI system. Youāll see this all over my screen when I use Honeycomb in production because weāre doing deploys all the time.
Let me give you a few examples of how this whole feature flag thing works if youāve never seen it, and how some insights that we can glean from whatās in Honeycomb about, hey, how our customers are doing. So if you think about it, this idea of software ownership which is, again, developers and operators examining how production is, not just when itās broken, but when things are working; that idea of being proactive and owning your code says, keep your eyes open the entire time.
Here at Honeycomb, you may have seen some of my user interface where I was over here at the query screen, and you notice on the right-hand side, if you did, thereās a lot going on on the screen, there are three tabs in this green bar over here. Green, teal, whatever color shows up on your screen. Thereās also a little āxā button here where I can hide it. My monitor is a huge monitor, but Iāve got a 13 inch MacBook Pro, and our engineers and our designers like to know kind of how people are using the product. If you have this screen hidden the entire time, you actually might not know thereās great history and team activity. But it might be a result of you having a smaller screen.
So instrumentation allows us to observe whatās going on. This is the really cool part of observability. One might ask, Chris might ask, do people have their sidebars open? Yes or no. And that sidebar is that green bar that I just showed you. Okay, great. Thatās helpful to give us an idea of the count of folks. What does the sidebar look like while queries are run? You did see the sidebar show up with the details on the dataset, but not everybody clicks on everything in every web app you use. And it might help if the history bar was ⦠Maybe that history should be defaulted at first, right? So that idea helps us understand it. And this is all just natively using Honeycomb again, not to debug problems, but to observe exactly whatās going on. Again, we try to live the observability lifestyle that we espouse.
That idea of creating a board, so if we were to go ⦠Iāll show you a board in a second, but you know, a board is a list of queries that may have a visualization associated with them. And itās kind of like a dashboard. Thereās an API for that so that perhaps out of the process of a build, maybe a new feature flag is deployed, boom, a dashboard is created. Itās really simple to extract an existing dashboard and turn it into something new. And weāve tried to make this extremely developer and DevOps process friendly.
This idea of a feature flag, if youāve never seen how they get deployed, obviously thereās some code thatās written in engineering. What youāre seeing on the screen is a product called LaunchDarkly, and LaunchDarkly is how we at Honeycomb, and many other customers, manage the provisioning of feature flags. We can see there are lots that weāre working on at Honeycomb, and weāve got some great things out here. As a matter of fact, we have a feature flag associated with the integration that weāre building for Okta, so if youāre an Okta customer, you can wire that right in. This might be one to kind of look at. So if we drilled into this feature flag, right now the default rule is to have it on, but a feature flag allows us to target specific users, specific teams, and turn those features on for them.
Well, if Iām doing software ownership and I turn on the Okta flag for some users, I might actually want to see a little bit about that. So Iāve got a dataset inside of Honeycomb that has information about what kinds of things users do. And as we can see, I have two boards here. I have one board here with two queries. One that looks at a feature flag for Okta FSO integration. If I clink into that board, the query is quite simple. Iām just looking at a particular team. We have a customer here thatās deployed it recently and we have obviously some testing going on. We can see how frequently they use it, which is great because we see a marker here thatās probably represented when it was deployed. Maybe this is 19814 when that flag got deployed. Iām doing a filter on here, flags.Okta = true. Weāve taken and instrumented our code so that we actually have that information on every flag that weāve put in here, right in Honeycomb.
Thatās why this idea of taking data from many different systems and looking at how, not only in prod when things arenāt working well but how when you deploy things, the ability to see whatās going on. Lastly, this idea of really looking into whatās going on. Another query I found today which I thought was really cool was āmost common window widths when looking at query results.ā So our team has a heat map here, and we can see most peopleās window widths are less than 3000 pixels, but this type of thing helps us understand exactly how the experience that people use with Honeycomb actually is. Again, I believe that most vendors that show you anything associated with troubleshooting and production or whatever should give you a good idea of how to actually use their own product.
Lastly, what I find is really cool is, queries are super easy to build, but the query history feature in Honeycomb makes it really easy for me to just search on what everyone else has queried, learn from what theyāre doing, and maybe get my job done quicker. So hopefully thatās a good overview of the idea of testing in production, software ownership, how Honeycomb uses even Honeycomb to look at how our customers are doing, and how the idea of feature flag works and could help you in production. Back to you, Peter.
36:11
Peter Tuhtan:
Awesome, thank you, Michael. So to kind of recap again what we went through today, really importantly, obviously, and before we move to questions, the entire team across your organization from the developers to ops to devs that work in ops to SREs to customer support, even sales can now rely on tools that give everyone visibility into whatās actually happening in production. By being proactive, you can get ahead of the issues that keep your team up at night and cause early gray hairs, especially if they are major and affect more customers or end-users. This gives back the evenings to your teams, and over time obviously they will spend less time being frustrated with whatās going on on-call and being able to focus on the core initiatives and the products and services you guys are trying to build.
Iād like to open this up to questions. Iād also like to highlight that if you go ahead and click on the attachments and links sections here at BrightTALK, youāll be able to find some useful stuff. One of them is āPlay with Honeycomb,ā which is, you donāt need to send us any data, itās just a sandbox scenario. You can walk through tracing or our events based querying scenarios, our documentation, and of course which weāll touch on in a second after we answer some questions here, the next stage of our webinar.
If you have any questions, please drop them now to us and we can reserve some time right here. And while weāre at it, again, Honeycomb Play in the attachments. You can start a trial by going to our website as well. And again, this is the second of our series of webinars, the third coming up on April 24th. Weāll be focusing a little bit more on tracing, so āSee the Trace?ā is the title, and weāll be focused on discovering errors, latency, and more across the modern distributive system. Open up for questions now.
So we have the first question, how do you get the whole team to be able to see inside the production system? I guess I have my own opinion about this, but Michael you work with our customers a lot more actively right now than I do. Do you have an answer off the top of your head for this one?
Michael Wilde:
Yeah. One of the best ways to get the whole team working in Honeycomb, aside from inviting them in, is to start using Honeycomb itself. You might be on your own, youāre trying it out, things are working well. Most of us are Slack users in some way, so I might take a query that I ran, and I might share that directly to Slack. So I might put that in my channel. I have one here just for the purposes of the demo, and I might say, āteam, check this out. People are actually rocking with the Okta stuff.ā Thatās going to end up in Slack, preferably decorated. It looks great. Youāll end up seeing the chart, the logic behind it, and that causes the conversation to move.
Somebody then pops into Honeycomb for the first time, maybe clicks on that, and then just randomly clicks on the upper left-hand button, and they can start to see what everyone else is doing. Once one sees others in a system, they often want to jump in. So start using it, but start sharing outside of Honeycomb and youāll find that it all ends up going both ways, in and out, and the team gets smarter and better.
Peter Tuhtan:
Yeah and I would add to that, a key example for me is working with customer support. And one thing that Iāve seen some of our customers do in the past, for instance, to get them involved using a tool like this, leveraging the data that youāre spending your time putting in, are things like making sure the trigger fires to the right channel in Slack. Itās simple but if you think about it, if you have a very high, high of importance customer out there in your customer base and you know who your customer success team is responsible for that, you send that trigger when something occurs around that team straight to your CS team, and theyāre the first line of defense in making sure your customer is taken care of.
A couple of other questions here. Duration of the free trial. The standard trial right now is about 14 days, give or take obviously depending on what your need is for integration with us. We can be flexible with that and help you get data in if itās not something thatās just forthcoming in our documentation. But beyond that, we also offer a totally free version of Honeycomb. If you hop on our website youāll be able to see all this information. So go ahead and sign up. If your time runs out on the trial and you need an extension, we can talk. If you want to jump into the free version, obviously recommend that.
Another question here. Michael, when you were drilling into the trace for the endpoint errors earlier ā¦
Michael Wilde:
Yeah, this is kind of an interesting one for me. The person asked, āWhen you were drilling into the trace for endpoint errors earlier, have you run into problems where the user was sampling their traces, and thus theyāll always have a trace to go with metrics?ā
Thatās a great question. First, one should consider sampling itself. When weāre sampling, Iāll share my screen so you can maybe follow along at home on exactly where Iām going, but out here in the Honeycomb docs, cruise over to docs.honeycomb.io. So let me give you one piece of information about Honeycomb: a product is sampling aware, meaning that every event comes in with a sample rate. This could be a sample rate of one. Every event represents itself. It could be a sample rate of 100, which might mean that event represents 100 events. Some systems do, not Honeycomb, but some systems do things like sampling one out of every 100 events, which is arguably not the best idea to do.
So thereās some documentation here Iām sampling that I recommend that you read, on why one should sample, methods of doing sampling. So letās say if you were only capturing successful requests, sure you could randomly sample one out of every 10 events, right? But in the case where I have a failure, I would never want to just sample one out of every 10 events to include the failure. So I might take a dynamic sampling approach where I put a different sampling rate for the successes, like every 100 successful events we keep, but every single failure event we keep. So when youāre doing sampling, be smart about how you do it.
Thereās some technology in Honeycomb ingestion agent that will help you with that. But if youāre doing instrumentation directly in your code, there are some ways to do that as well. But if you do the sampling right, then the events that need to be captured will be captured to their full fidelity, and the ones that youād like to sample so as to save on time, speed, and the size of your data, youāll get the right experience that you expect. Hopefully, that answers your question.
Peter Tuhtan:
Alright, if there arenāt any other questions, Iād like to remind everyone right now that a copy of todayās webinar will be emailed to you, the attendee, so you can always revisit it or share it across your teams. Iāll hang here for just a few more minutes to see if we have any other questions entered. Otherwise, I hope everyone enjoys the rest of their days, evenings, and afternoons.
Okay, it doesnāt look like we have any more questions. You can always get in contact with us by emailing support or solutions@honeycomb.io. Also, you can email me personally if you have questions and you donāt want a whole team and you feel like now we have an intimate relationship because of this webinar, Iām Peter Tuhtan, P-E-T-E-R, @honeycomb.io, and Michael Wilde is michael@honeycomb.io. Weād be happy to help you with any questions, comments, feedback on todayās webinar, or if youād like to get started using Honeycomb. Again, enjoy the rest of your day, and we hope to talk soon. Bye.
Michael Wilde:
Thanks, Peter, bye.