Deirdre Mahon [VP Marketing|Honeycomb]:
Hi everybody. Welcome to today’s webcast. Honeycomb Learn is the series and episode one: “Instrument Better for a Happy Debugging Team.” My name is Deirdre Mahon. I’m here sitting next to my colleague Nathan LeClaire and we’re going to get started in about two minutes. I want to give some folks time to join. So I’ll be back to you in about a minute or two. Thank you.
Okay, folks. Thanks again for joining today. Taking time out of your busy day. We’ll run for half an hour, forty minutes depending on questions. I’m going to do some housekeeping before we dive into the details. So, welcome to our webcast episode one “Instrument Better for a Happy Debugging Team.” Before we dive in, if you have a question at any time during the webcast, use the ask a question tab located below the player. Your question will be addressed during the Q&A at the end. Also, at the end of the webinar please take a moment to rate us and provide any feedback using the rate bit tab below the player. For attachments, we will give you access to Honeycomb play, the slides, or today’s webcast and some observability eguides. If you do have technical difficulties, I won’t be taking care of that but somebody will. So go to the bottom of the page and click on support for viewers. And finally, the recorded session will be available using the same URL shortly following the conclusion. Feel free to share it with your friends and colleagues. So, let’s get started.
Honeycomb Learn, we have a Five-Part Series that’ll run monthly for the next five months. Episode one today, Nathan LeClaire is the speaker of the main event. Nathan?
Nathan LeClaire [Sales Engineer|Honeycomb]:
I am your host today. My name’s Deirdre Mahon. I’ve been with Honeycomb for three months. I’m actually pretty excited to join the Honeycomb team. We’re doing some really interesting stuff for developers and operations teams in the world. So, let’s introduce yourself Nathan, and your background. I think that will be useful for the audience.
Sure! My name is Nathan LeClaire. I’m a sales engineer here at Honeycomb. I help with everything from getting your customers enabled so they can actually purchase in the first place, to then making sure they’re successful in the whole life cycle of using Honeycomb. I worked for three years at a company called Docker. We created a lot of new, cool technologies and trends in the world. We also created some observability problems so I’m here to mop up some of that mess and sort of pay my dues by serving the observability space at Honeycomb.
And if you can’t tell from my accent, I hail from Dublin, Ireland. Nathan is from Kentucky. Go Wildcats!
Yeah. Big basketball fan. I’m actually a Warriors fan and he just told me that cousins came from the Wildcats College.
That’s exactly right.
Very cool. So, our topic for today: What is instrumentation? Who is responsible for instrumenting code? Why should you do it? Is it hard? There is a belief that it’s difficult to do, it’s time-consuming, developers want to get on with developing the features in terms of the housekeeping and helping for the future debugging team. And then, how do you get started? So we’re going to show you Honeycomb in action. Nathan LeClaire’s gonna drive that.
But before we get into that, we’re gonna go through some as practice. So, first of all, taking it a step back, what makes a DevOps team unhappy? These are things that we’ve heard from the market and from our customers. Often it feels like they have too many tools when an incident occurs or a major event or there’s a performance challenge going on with your production system, it’s helpful to know exactly where to start. There’s a lot of questions. You’re juggling a variety of tools, be it metrics, monitoring, blog management tools and often you reach a dead end and if you can’t resolve the issue or the problem with your production system, often you go back to the engineering team and deliver codes that are hopefully going to fix that for the future.
A lot of teams are spending too much time on calls. The more time spent fixing and maintaining, the less time is spent innovating. In fact, there’s plenty of surveys out there in the world, a recent one that Strike did claimed that 42% of time is spent on maintaining and fixing. The impact on the business is pretty detrimental. So, sometimes issues just aren’t resolved fast enough, SLOs (aren’t being met. Customer complaints are on the rise so then the impact to the customer support team and, of course, revenue and the reputation of your brand is damaged.
So, observability is, there’s a lot of different vendors in the market claiming observability. Our definition of observability is that the production system is in a state where you can ask it any questions. So, to better understand exactly what is going on with your production system right now at any point in time. And that means everyday issues that occur with production systems that are distributed and microservices architected, but also when something really bad goes wrong and there’s a major incident or an outage.
During this series, we will help you understand what is the path to observability. A lot of teams feel like “I’m not ready for this” or “it’s too challenging.” Our goal with this series is to help guide you and walk you through a set of best practices and tools to get you to a point of observability for your system. So, today’s focus is all about instrumentation. Upcoming ones will focus on these other aspects which are: how to run queries, how to set up alerts, how to do instant response and introspect your code in real-time through interactive visuals, and spotting outliers and anomalies compared to the baseline, and then ultimately, learning as an entire DevOps team. So, there’s a continuous iteration and learning and improvement of those systems as you all become more familiar with what’s going on in your production.
So, Nathan, let’s start with, what is instrumentation? I think it’s always important to clarify definitions and we chatted a few days ago in preparation for this session and it’s sort of like joining the dots on a picture. I found this one because it’s actually kind of hard to tell what that picture is. Although, it’s starting to be joined up. Tell us when you work with customers, what is instrumentation?
Well, in the Honeycomb model what we ultimately want our structured events, which in the case of tracing, are the same as spans. So, we need to understand sort of what’s actually happening in your systems in production. We need to capture the real information of what’s going on in your code and send that on to Honeycomb somehow. So instrumentation just means adding ornamentation to your code that actually captures those details and then forwards them along to Honeycomb. So you can actually clearly sketch out what’s really going on in production and justify your mental model of what you think is happening in the system with what’s actually happening.
The way I think about it is, the team of developers are creating information and content along with their code so other team members can visualize what they were thinking when they created that code in the first place.
Yeah. You usually collaborate together to write instrumentation as a team. Very frequently one person who is very motivated will add the initial stuff that gets you to think out of the box or maybe some low hanging fruit. Then people will add things of interest later on that maybe are really relevant to them or by dropping a couple of lines of code in, you can ornament other things with additional information. You do see that instrumentation is this iterative process where you start with a small core and then you just work your way outwards and then next thing you know you have just about any answer that you might want out of the system in the data that you have coming into Honeycomb.
Yeah. Okay. So, I just noticed a question from the audience that the audio’s not great. We will do our best to speak up. We’re set up in a dedicated room. We’re actually sitting next to each other and we’re on speakerphone. I usually don’t have a problem projecting.
So, tell me Nathan, in terms of this visual model, or depicting something that was thought up as the development team was creating this new feature. Don’t metrics and logs give you this? I think of it like, is it a sketch drawn by an eight-year-old that you roughly can figure out what that is? It’s some squiggly lines, I think they’re birds? Or is it something that’s much more high resolution like a Monet painting wherever you can actually see a really pixelated view of exactly what’s going on? Is that a good analogy or metaphor to use?
Yeah. I think that’s a really good way of thinking about things. Especially metrics. They really present this very high-level sketch of your system. As you want to add more detail to metrics, you really rapidly hit storage space limitations and fundamentally just limitations of the data model metrics. Where, in Honeycomb you can have a variety of fields and you’ll see some examples as we go through the demo of instrumentation. You can have essentially infinite dimensionality and be tagging your events with anything that you want. That’ll be available for you to query later on.
I think of it as capturing this really crisp, high-resolution snapshot of what’s actually happening in your systems in production. You’re not working with these second or third order pictures of things. I think high resolution is a really great way to put it. Especially metrics. They just have a high-level sketch. Logs, on the other hand, generally tend to be really, really zoomed in. So it would be like storing individual pixels when you’re actually interested in a nice, crisp clear picture. You can sort of zoom in and out of. High resolution is absolutely how I would describe the Honeycomb data model.
So, who on the team should be focused on instrumenting? This is certainly something we hear a lot of with customers and new customers where it’s like, oh should it be the entire development team? How involved is the operations and SRE team in this effort? And how does somebody get started on doing better instrumentation?
Generally, like I alluded to before, I think of instrumentation as something that maybe one brave soul takes up at first and does the initial work just reading off the boilerplate. Things like initializing the Honeycomb clients and ensuring that the right context can be propagated through the system. Then they lead the way for everyone else on the team to be able to come in and add any little custom details they want.
For instance, on our in-house stuff at Honeycomb, I’ve gone into code that other people have already started instrumenting and added particular details that are of interest to me. So, we do find that a lot of the time people feel like instrumenting their code is going to be a much bigger deal than it really is. It just takes a little bit of elbow grease to get started with and then the payoff is huge. You can actually capture the real things that are happening to your code. The real objects and sort of the state in the code and put it into Honeycomb. It really works. Certainly, a team effort. Frequently there will be one hotshot that kind of leads the way. Then everyone can chip in and add additional little things as they go along and move forward together.
Yeah. My understanding of the practice is, don’t feel like you have to retroactively go back and implement prior code that was created but from here on out, start to do a better job of instrumenting new features that you’re releasing.
Yeah. Absolutely. And that’s just definitely something you see a lot is just the shift in mindset where people who weren’t really doing true observability before. So when they start to peel back the layers of the onion. You start to realize the potential and, “Wow! I really can tag these things with really, really high dimensionality fields. I can have any arbitrary field that I want.” Whether it’s a region or client version or a big, long awkward string, Honeycomb just handles it all and creates it very, very quickly. You can also generate tracing data. So, getting the actual answers that you’re interested in, becomes a lot easier.
Okay. In terms of the overall value Honeycomb has, if you’ve taken instrumentation quite seriously and created beelines, what we call beelines which is auto-instrumentation for those useful events and traces. We have beelines for a variety of languages.
Today’s demonstration that Nathan’s going to go through is for Node.js. We also have Ruby, Python, Rails, and just released Java. So, if any of you out there are interested in that, definitely get in touch with us. Then we will see what Nathan just talked about. Which is, start out with those libraries, do some auto-instrumentations, get data into the Honeycomb tool, and then add to it personalized data as you go or grow. Then you see the ability to slice and dice requests by a variety of dimensions. You should be pretty quickly off to the races and start doing some queries and debugging. So, let’s shift gears and Nathan’s gonna start up Honeycomb and do a demonstration of the product and hopefully reflect those best practices that we just talked about.
Yeah, so I’m just gonna hop into a screen share here in BrightTalk. What I’ll be showing today is a demonstration of something that is actually highly inspired by-
Did you hit the screen share button?
Okay, here we go. It’s taking a few seconds folks. There we go.
Yeah, so there’s the BrightTalk and just gonna pop over to this very exciting terminal window here. What we’re gonna go through is an example of doing tracing using the Node.js beeline and what’s fun is a lot of people they see our beelines and they see that they can do automatic instrumentation of things like popular web frameworks like express or database layers and that kind of thing.
So, I have a little script that does that here. Just to show you an example of what it really does, let’s say that I wanted to take a snapshot of the Google search results of news for Kentucky basketball. I’m running this program. It would look something like this. Then what you can see here is that it actually generated this screenshot.ping file that’s a screenshot of that. Then here is the screenshot, what it actually looks like.
So, when I ran the script, it started up a headless Chrome instance, navigated to this search for Kentucky Wildcats on Google and a lot of people might be doing stuff like this in a background job and that kind of thing. This is actually a great example of something that can be real differentiated for your business. This kind of extra functionality is something a lot of people are moving towards and wanting to differentiate themselves on and beating the competition by doing new and cooler stuff.
But that presents more operational challenges, right? We’re talking about something more complex than just running a simple web service. We’re doing headless Chrome browsing and that kind of thing. So, the possibility of this going wacky in production is higher than just a normal web app. Sometimes you launch this kind of stuff and you know if you can do a lot for your business but well, how the hell are we actually going to monitor and observe this thing? Personally, we think the answer is by instrumenting your code and getting this awesome higher solution visibility out of Honeycomb.
So, coming back over to the script here. Let’s take a look at what this code actually looks like. So, the first thing that we’re doing is we’re importing the beeline library and initializing it with our Honeycomb right key. We’re also requiring Puppeteer, which is the thing that actually does the headless Chrome browsing, and the very first thing we do is we started a trace. So, in the beeline, the main operations that you’re working with are start trace and start span and finishing those traces and those spans. So the very first thing that happens is we actually call the beeline dot start trace which tells the beeline, “Okay, I’m going to actually start a new trace,” then eventually we’re going to be generating results that get sent to Honeycomb and we can use it to actually observe our app.
So, just to maybe give you a quick view of what the finished picture actually ends up looking like. Here is an example of what a full trace will look like in the end. It depends. As we go through the code, I think it might help you to understand from the end result we’re looking for how this plays into what we’re actually doing in the code. So we have the root span here which is about three and a half seconds. That’s how long the full script took to run. Then we have broken out each little component of the script into child spans of this one parent span. So, the first thing that happened was we launched the browser. Then we went to the page and we can see clearly that this was where the most latency was incurred.
So this is exactly what tracing is all about. Trying to figure out where your script is actually slow. Then maybe we can actually improve that or at the very least we can figure out what it actually looks like when we run this script, right? Because if I just run it and I don’t have any instrumentation or insights into it, what do I know? It might take a really long time to screenshot the page. It might take a really long time to launch the headless Chrome. You don’t have any visibility. But now we have this really high-resolution picture exactly what it looks like to invoke the script. This is what we were talking about. About being able to gain that really excellent high definition snapshot of what’s really going on in your code.
So, then we screenshotted the page. Each row is a span. You can see over on the right here that each span has custom properties associated with it. Coming back to the code, this launch browser span here is generated as a result of calling beeline.start spans. So, I have this line of code right here that actually does the thing. Before it, I called start span, and then after it, I called finish span. How the beeline works, it will basically time these things and then bundle up all the context and details into the span and send it along to Honeycomb. This object that we’re passing into the start span hall here, will be transformed into a Honeycomb event. So, you can see that in this next go to span, which ends up becoming this, we have an additional custom field. And you can have as many custom fields as you want and Honeycomb will handle incredibly wide events. With hundreds of different fields. To the point where it’s more of a human bottleneck than a system bottleneck to be able to deal with that many fields and that much information and context.
The actual page that we’re snapshotting has an argument that’s tasked to this call. So that way it’ll be actually added on to the generated event. As you can see, we have these beeline.start span calls that end up becoming spans in the finalized trace. When we run the script, in this case, we only have one custom thing, but usually what you’ll be doing is augmenting your information with as much custom detail as you can or that you think would be actually useful.
I could, for instance, run this again on the Honeycomb homepage. Maybe turn the debug mode on so we can see the actual details of the span that’s being generated. And you can see the debug output of the beeline is really useful for actually getting a picture of what spans are being generated and which trace ID is being used. We can actually grab this trace ID and just pop it right into the Honeycomb query builder to look up a specific trace. I could filter where trace.traceID equals this and I can see my exact trace right here. If I go over to the traces tab, this will let me jump right to the trace that I’m interested in.
We can see in the go-to span that here is the custom field of the screenshot page. So we have this custom detail and while it’s kind of cool if we look at them in the tracing view, where this is really useful is things like maybe I’m running this on 200 different websites in production. Or, you know, I have a variety of different scenarios where something might be happening or might be slow that I might want to look into. Or maybe there are errors and that kind of thing. We actually have features like BubbleUp that let you grab a collection of points. So for instance, I can see in this heat map here that here are the slow points. The heat map will show me where something is slow. Or where something was just high. It might be a regular old numeric field.
In BubbleUp, we’ll actually see a visualization of all this different context and detail that we can add on so that we might be able to spot what’s going on and what is actually the problem. So, we can see that, for instance, this particular slow point was associated really strongly with this specific trace. I can filter down to that trace and just like with the other one I can pop right into the tracing field.
Being able to query over all that custom detail is where Honeycomb really crushes it, right? I can break down by something like a screenshot page and I’ll get this little table summary of all the different ones that I’ve run and maybe I’ll expand the time window here. I can see here is the one I made to Honeycomb, here was the one I made about Kentucky basketball. I can also snapshot just Google.com. And run the script again. And one of the things Honeycomb is pretty great about is really, really rapidly, after you actually send these data points to Honeycomb, they’ll be available for query.
One of the things that we hear from our customers that drives them crazy is if there are lags on being able to access their data that they’re sending into their observability or monitoring systems. And in Honeycomb, just because of the way the architecture is, the operational expertise and the excellent pedigree of the team, the data ingestion is crazy rapid. So, it’s definitely worth calling out. The query engine is crazy fast. We can scan ten billion data points in ten seconds. That kind of thing. Really just get you to those answers you need to know in order to play with more confidence and to play more often and to…
Be both faster and smarter.
Be both faster and smarter but also to reclaim precious time, right? I mean, that’s one thing I think is really hitting at the heart of what people find in Honeycomb that they’re resonating with. Their engineers are just spending so much less time fighting fires, pulling their hair out looking through logs, or something like that. And a lot more time doing the fun stuff and writing new code, shipping new features and delivering value. So, that’s kind of a quick whirlwind tour and I hope you guys enjoyed that.
Thank you very much, Nathan. That was great. We don’t have a ton of time to show you the product but that’s why we have future upcoming episodes in the series to do that where we’ll get into a lot more detail on some of those features that Nathan touched on. The rapid fast query engine as well as the BubbleUp spotting outliers. Our goal today was to share with you how instrumentation would work and the importance of it, the value. It should not feel like climbing Kilimanjaro. It should be feeling like a little bit of a short hike but it pays dividends for not just yourself as an engineer, but for your ops team and your SRE, so take that time to instrument upfront and learn the power of beelines. We continue to build out those capabilities supporting other languages depending on what our customer needs are.
So, to get started yourself, we encourage you to start with Honeycomb play and if you’re not ready to use your own data from your system, this is our data. We developed this based on some incidents that we have here at Honeycomb. It gives you a good view and access to a lot of features. And when you’re ready and the team has some bandwidth, start a product trial. You’ll probably interact with Nathan and some of our colleagues during that trial. Then I encourage you to check back on this BrightTalk channel. Episode two will be March 20th. So coming up soon is all about how we’re helping de-stress debugging. Lots of new features to share with you in that session.
Thank you again for taking the time. Nathan, that was excellent. I will now take any questions that you may have. Keep typing at the bottom of your screen, I believe it was. So, we’ll just give folks a minute to type in any questions you have.
I have a question from the audience: “In terms of instrumenting, can you also ingest logs?” So, how do we log ingestion data into Honeycomb?
Sure. So, if you already have logs that you’d like to get into Honeycomb, there are a variety of ways that you can get them in. We do have some binaries and built-in integrations where you can just start running it and, for instance, if it’s Amazon EOB logs, we have a little binary that will just go and query the AWS API for information about that. Download them and then parse them out into structured events and send them to Honeycomb. We have a variety of ways to get logs in but usually, it’s just trying to figure out a way to shim that into one of our existing integrations.
For instance, we have a Swiss army chainsaw, if you will, of log parsing binary called Honeytail that tails along with log files and will parse them out with regex or sometimes there are formats that we already understand out of the box like Nginx where you can send them along to Honeycomb. Now, that does come with a trade-off where you just don’t really get as much flexibility out of those as you would out of something like this native code instrumentation that I showed off. Same thing with traces. It’s kind of hard to get that out of static logs. But we do offer that and that can work really well as a bridge to get people excited about Honeycomb and make them realize what’s possible and maybe why they want to go for the native code instrumentation. So, Honeytail’s sort of the short answer.
Yep. Okay. Cool. If you do want more information on instrumentation, you can go to the product section of Honeycomb.io and we actually do weekly blogs mostly written by our engineering team. So, there’s some meaty content in there.
I have a couple more questions. How difficult is it to ask support for other programming languages? In addition to the five I mentioned, what’s the status there? I know you can do that fairly quickly depending on the customer’s needs.
Yeah, how difficult is it to add support for other programming languages? Well, that’s all just a factor of whether that aligns with our in-house expertise, how well you understand the Honeycomb API. I think basic support for sending things to Honeycomb is usually pretty rapid. We have a really simple API. You just post JSON to an endpoint and then it ends up getting ingested into a dataset. A sort of simple API usually ends up being pretty approachable.
In the integrations where things get a little stickier is usually you want to do things like batching and running the sends in the background and that kind of thing. So, that can take a little more time. Likewise, the beeline is a full-fledged native binding for sending traces and that kind of thing in. So, that can take even more time. But, adding support for other languages? It just depends. If you want to add support for something that you have like if you love Erlang and you wanted to work on Erlang bindings, we’d be happy to have you in our Slack channel to just work with you to explain our APIs and idiosyncrasies and kind of help you get there.
Is anyone actively working on a dot net beeline, is the question. There are a couple of people actually who are third parties. They’re outside of Honeycomb Inc. who are very interested in a dot net beeline and actually working on similar things or pseudo beeline things. We can’t really properly call it a beeline unless it’s in-house. We at Honeycomb Inc want one really badly. We are just, kind of, it’s one part of our constantly changing roadmap of priorities. Well, it’s not constantly changing but-
Depending on the market needs.
Yeah. I guess what I’m saying is we’re a small start-up with limited means and abilities and we need to figure out where is the right time to invest resources in that but we do want it really badly and so do people in the community. So, there are people who are working on that, yeah.
We can take another two questions and then if we don’t answer them we will definitely get back to you.
“Why should I use Honeycomb over AWS X-Ray?” Well, because Honeycomb’s the best, obviously. No, I mean to be honest I don’t know a whole ton about X-Ray but what we usually see with AWS tools is they’re very frequently just kind of good enough and their main thing that is an appeal is that they’re built right into AWS. They don’t have that deep, deep product dedication and love that we have with something like Honeycomb where we are the experts in the space and we’re 100% focused only on making Honeycomb the most killer experience possible. So, X-Ray, what I found in just talking to people is, it’s a little limited in the ability to actually find what you want and it’s sort of just this good enough moderate Amazon compromise. I think Honeycomb is the best so you should use Honeycomb.
So then we had another question, “How do I link events from different tiers from an ALB to a web server to a database into traces?” So, there’s kind of a variety of possible solutions and I will say that it will be potentially difficult to get ALB logs in the same data set as your tracing data. But I would argue that’s not necessarily a terrible thing because usually what I would recommend for people who wanted to grab a Honey ALB and send some of their ALB logs in, is to send their Honey ALB data to a separate data set and just crank the sample rate up. So, in Honey ALB you can crank the sample rate up to 20 or 50 or 100 and you’ll get so much more retention. And you’ll get most of the benefits of having it there because we actually dynamically sample based on things like status code.
So, for the weird odd-ball requests that get 503’s, most of those will be included. But the boring 200 stuff will be sampled at a much higher rate. And Honeycomb will do all the math and draw everything just the same. A sample data set, I just kind of think of it, it’s almost like a data set with some JPEG compression applied. Yes, technically it’s flossy but usually, you can create quality that’s still very high and it’s good enough and it works great because, just like with images, you wouldn’t necessarily want to push a kajillion pixels over the wire to essentially get the same result when you can do it in two megabytes or something.
I highly recommend heavily sampling ALB stuff and then having your traces in another one. And all of your database stuff, frequently we will hook into the libraries that generate that so you can get database tracing sort of out of the box with the beelines. And even if we don’t, you could just write some custom spans like I showed you. So, is there any more?
I think that’s all the questions and I appreciate that. An engaged audience, that’s cool. So, that concludes. We’re bang on forty minutes. Nice job, Nathan LeClaire. Go Wildcats. Check back in the channel and we will be in touch with you to send you further content to read all about it, and have a nice day everyone. Thanks so much for joining. Bye-bye.
If you see any typos in this text or have any questions, reach out to firstname.lastname@example.org.