Code with Confidence
Instrument for Your Future Debugging Self

 

+ Transcript:

Max:

Hey, everyone. Thanks for joining us for today’s webinar, Code with Confidence, Instrument for Your Future Debugging Self. We’re going to give this about two minutes while we wait for some late comers to join. So we’ll be talking to you real soon.

1:51

All right. Hello, everyone. Thank you for joining us for today’s webinar, Code With Confidence, Instrument for Your Future Debugging Self. This is part of the CloudBees innovators program. We’re very happy to bring you this session with Honeycomb.io, and we’re looking forward to a really good session today. I just want to let you know a few things up top before we begin. I won’t take up too much time. One, we can take questions. There’s a question pane in your control panel. As soon as you think of one, feel free to put it right in, even if you think maybe we’ll get to it later, or maybe we won’t have time, please ask them as you think of them. It’s always good to recap on things that people are interested in, or we can even just take them offline if we do run out at the end.

Also, we are recording this, so if anyone has a colleague who is unable to join, or you need to drop early, this recording will be out for everyone very shortly. Brian, can we go to the next slide, please? Before we begin, I would just like to introduce everyone to our speakers for today. We have Brian Dawson, a DevOps Evangelist from CloudBees. Brian, you want to say hello to everyone?

Brian Dawson:

Hello, everybody. Max, thanks for hosting this. Everybody, thanks for joining us today.

Max:

Absolutely. We also have Michael Wilde, Director of Sales Engineering from Honeycomb. Michael.

Michael Wilde:

Hey, everybody. Looking forward to it, and it’s an honor to hang with CloudBees today. Thanks.

Max:

All right. As you can see, the Twitter handles are at the bottom of the screen, if anyone wants to reach out to our speakers, let them know what you think or if you have any extra questions. But for now, I’d like to get started. We will have time for questions at the end, just as a reminder. Brian, you want to kick us off?

Brian Dawson:

Absolutely. Thank you, Max. As Max said up top, we’re hosting this webinar session with Honeycomb as part of our innovators’ program. And one thing that we’ve realized is hard to debate is that we are in an era where every business is effectively a software business, meaning every business is leaning of software as a competitive differentiator. With that, with the speed at which you can change and use software to drive business, there’s an increased pressure to innovate. We are being asked as software development professionals to innovate constantly. However, what we’ll find is, as we seek to innovate constantly, as we seek to move faster and recognize increased velocity, we run into increased risk.

It’s technical risk that at the end of the day, results in business risk. How do we achieve complete testing coverage? How do we properly support our runtime systems? How do we ensure that we’re avoiding security holes? How do we deal with a myriad of technologies from the applications we’re running to the tools that we use, and how do we efficiently and effectively identify and address runtime issues? The solution at hand that the industry is following to be able to innovate faster or drive the innovation, rate of delivery, is continuous delivery in DevOps. Let’s take a moment and look at continuous delivery in DevOps in context.

5:23

Brian Dawson:

First, I do always like to state that I see DevOps as a set of cultural principles or tenets that you subscribe to and ultimately aspire to. You can better achieve or reach those cultural goals by leveraging component technical practices, such as Agile, continuous integration, continuous delivery, and for some, continuous deployment. Where you’re using those practices to help ensure that you can get from upstream development to downstream delivery in a manner that you’re able to deploy rapid changes into run time, quickly identify issues and get user feedback, and use that to inform and improve the changes that you subsequently make. Fundamentally, DevOps and its compelling practices are about establishing feedback loops.

Feedback loops that will hopefully give you continuous insight, to enable you to continuously improve. However, in order to hit a high rate of velocity and a high rate of delivery, that feedback loop and the insights that enable that feedback loop are non-optional. In practice, I’ve seen that there’s a tight correlation between your ability to establish trust across your software development stakeholders, or what oftentimes are referred to as functional silos within your organization to remove the friction that inhibits a high rate of delivery. We’ll find in our more mature continuous delivery and DevOps organizations that their rate of delivery correlates very tightly with the fact that they have people and culture, processes and practices, and ultimately tools and technology that will enable trust between the software development stakeholders.

Obviously we understand that there’s an interpersonal or cultural element to trust that has to be built up. While we won’t talk directly to that today, we will talk to the fact that there are processes, practices, tools, and technology that help contribute to both the rate of delivery and trust. For instance, from a CloudBees’ perspective, or tools provider’s perspective, we find that you can accelerate way of delivery by ensuring that you have clear and smart integrations between what are traditionally siloed and disconnected tools, that you’re able to automate the independent or individual processes as your workflows through that integrated pipeline.

Then ultimately, you also have the ability to orchestrate more complex and involved processes, such as tests, security checks, and complex deployments. Then you also need to take an approach where you are governing what is implemented, i.e., you’re able to share standard best practices, pipeline and script definitions, and you’re able to codify organizational policy so that you can automate that and get it out of the way as opposed to making adherence to corporate policy a manual operation. Now, on the other axis, what are some of the things that are required to institute or deliver or recognized trust?

I’d say one, we need to ensure that upstream and throughout our delivery pipeline, we’re paying attention to quality. We need to ensure that we’re not compromising security. We need to give all of our stakeholders, especially our downstream stakeholders and operations who are receiving this work, the benefit of traceability. So they can see that security quality and other agreed-upon steps have been taken in delivering a build that ultimately is going to be deployed out into an environment that potentially somebody else is responsible for. Then for everybody, trust is increased with monitoring and observability. It’s one thing for me to develop a change, send it down the pipeline and get it deployed to production. But it’s another thing for those who own it in its runtime state, to be able to have active insights into how the chains that I delivered is running.

But it’s also important for me as a developer to be able to inform, to be able to validate the change that I made and inform future changes by also being able to observe behavior and performance in a runtime environment. This correlation between integration, automation, orchestration, and governance, quality, security, traceability, monitoring, and observability leads us to this partnership between CloudBees and Honeycomb within our innovators’ program. As I said upfront, we see that it’s imperative that businesses are enabled to innovate, and we of CloudBees have identified people in the industry, solutions in the industry that we feel are embodying this concept of innovation and helping drive us forward.

One of those organizations is Honeycomb. Honeycomb, we think, compliments one of our core offerings as well, and that’s CloudBees’ core where we focus on the automation piece by providing a scalable solution for continuous integration and continuous delivery with a unified governance engine. You can support compliance and standardization as well as shared best practices while empowering the developers with the tools that they need to move fast and innovate freely. Now, we have Michael Wilde with us from Honeycomb. Michael, I’m hoping that you could do two things for me. One, if you could take a moment to tell us a bit about Honeycomb and your solution, and then I’d also like to hear from you, do you agree with my thoughts on the requirement to have trust and visibility in order to make a CD and DevOps implementation successful?

Michael Wilde:

Sounds good. I only have to do two things. That’s great.

Brian Dawson:

I can give you three or four more. No, we’ll leave it at two.

Michael Wilde:

Just add it to my list, Brian. Thanks.

Brian Dawson:

All right.

12:23

Michael Wilde:

Yeah. thanks a lot. Welcome, everybody. Honeycomb, our company, and our software service, it’s really an event-driven analytics tool. it’s used by DevOps teams, both the folks that are writing the code and the folks that are supporting production. To be honest, sometimes that is the same set of folks. What we generally see is this idea of debugging the system smarter and faster, this whole feedback loop that Brian has been talking about. You made a comment Brian, about how do you support runtime issues? How do you recognize trust within the things that you build and the linkages to the build pipeline, the deployment pipeline, continuous integration, and delivery gets tighter and smarter when we start to make observability an important thing.

We’re starting to see organizations looking at going to the idea of observability driven development. They’re not just monitoring to look for things that are broken, but they’re starting to look at every deploy, every feature flag, and being a bit more of an owner in that respect. Thanks for the awesome portion that you gave, Brian. What I’m going to try to do today is give an overview where we see things at the world at Honeycomb, how coding with confidence really becomes a realization when you take things like CloudBees core and put them together with Honeycomb. I’ll even give you a demo and then we’ll have some time for Q&A.

What we see is there’s kind of three themes that we have we’ve helped move forward, but also observed in the industry. Distributed systems, testing and production, and software ownership, it begins to look like more cloud-native systems, more breaking systems down into services, even more, black boxes that you can’t necessarily SSH into. People using processes like Agile and CICD, give them the ability to test in production. Once we start threading in the idea of observability, then engineers and operators have the possibility of becoming software owners. Instead of feeling like we just throw something over the wall, as Brian talked about, DevOps is a huge cultural shift. Not the easiest thing in the world, but once you’re there, the whole team feels like they’ve worked together. The goal behind all of this is, as the demand is increased, that the pressure to ship code faster, everybody wants a new feature, we’re all software businesses, downtime is pretty much unacceptable and service level objectives are being developed and hopefully met, being able to see and make changes quickly is critical.

Lastly, when we start to a world of DevOps, everybody’s sort of on-call. We’d like to live in a world where, as we say at Honeycomb, on-call doesn’t have to suck. By making things better, we can continuously improve the services that we have, increase the trust from our customers and everyone else, debug smarter, ship faster, and really get rid of some of that technical debt that we’ve wanted to catch up on, that we all want to get going. This idea of observability is a popular term, as of late, and it really is all about being able to look at how your applications, service, whatever, it’s behaving on the inside from all that comes out of it.

Telemetry is key. There are a lot of different places where one can observe what’s happening. Whether you look at a simple, passive logline, things coming out of a serverless environment, taking and generating events with an SDK. At Honeycomb, we support all of these, but we have a really cool add on to Honeycomb plugin called a beeline. That allows us and our customers to do automatic instrumentation, meaning that we can now put stuff in the code that will emit what’s happening. We’d like to live in a world. We hope we move to a world of structured events that capture a unit of work because rarely does a log event tell you much about what’s really happening. On that same note, we think instrumentation is the new logging.

17:15

Instead of just the simple passive print F statement that says something that could be read by a human, starting to look at instrumenting your code at the time that you build it. What you see on the screen is an example of how simple it is to get started with a Honeycomb beeline. Its easy instrumentation does automatic tracing, add a few lines of code in the beginning, in its statement in your app or service, and we now start to light up what the app or what the service, or really, what your code is actually doing. We often think of log management, log events as parsing through texts, but we’ve seen customers benefit from starting to evolve, to capturing units of work. Building structured events and not simple messages, that have a span of everything that’s been happening in their code, all of the requests it makes, extra fields, and context that might be relevant to the service or the code that they’re building.

This example here is simply adding a field for creating a span, which is a whole path that code takes, and then adding extra fields that carry themselves along with the span, so we don’t end up having to spend so much time hunting around, log searching, all this other stuff when really, we’re technical folks, we like to use math and visualize how things are performing. Honeycomb itself, as I said before, observability starts with instrumentation. We provide ways for our community to instrument their code. Now, granted, it’s easy to take and ingest existing logs, emit things from Lambda, as an example, or other serverless environments, use things like OpenTracing, Kubernetes agents, pull things that are happening off the wire, but we find that the Honeycomb beeline is a pretty awesome way to get started and to light up your code. Now, all of that data ends up heading over to Honeycomb to a revolutionary data store that we built. The service behind Honeycomb is extremely fast.

It was built by people who built components of Parse and Facebook and looked at this problem and said, things are moving too fast for us to mess around with log events all the time. It’s extremely fast and extremely efficient, and then our users get the ability to do analysis. You’re going to see the query builder today, which is extremely simple to use, yet incredibly powerful. No complex search language to learn, just point, click, add a few breakdowns and you get some really awesome visuals. Additionally, as I talked about using a Honeycomb beeline or other tracing environments, distributed tracing now is becoming a norm. Honeycomb directly supports tracing within the concept of everything we do.

We’ve got all this great telemetry coming out of the system, how am I supposed to use it? You’re going to see a cool example of me using something we built called BubbleUp that will really let me open up and see problems without a whole lot of work. Naturally, pre-aggregating things and using metrics only isn’t a huge help, so at a Honeycomb, we keep every raw event. After a user starts doing analysis and starts finding problems, we recognize at Honeycomb, we all work on a team. It is too often where there’s a search person or a metrics person, or there’s one person that knows a tool really well. Every member of the team should be able to become the best debugger. You’ll see a little bit of that in my demo today, where we can see an activity feed from my own history and my team’s.

Everything we create, we can curate and save to a board, and do things like triggering, sending messages to Slack, rendering what we see to our team members in a lot of different ways. Now, to take and link it all together, and it’s one of the reasons why I was really happy to be able to have a webinar with CloudBees is because observability doesn’t necessarily just start with a deployed system. The code is built and processed and deployed by a continuous delivery platform. So, hey, let’s let the CICD system do the talking as well. We often wonder when we’re debugging a problem, was there a new build of the app? Was there a deploy of the system? We have an idea in Honeycomb called a marker, and these markers, these timeline markers can be sent from any of CloudBees’ suite of solutions, and they appear on a Honeycomb timeline.

On the right-hand side, you see a little binary, a little script that you can add to your build process that might throw a timeline marker representing a particular deploy, a particular build, or it might represent a holiday so that there’s some context around what’s happening in the system. On the bottom of the screen, even the folks that work on CodeShip have some documentation on how to do that directly. It’s super simple and very powerful. Additionally, I can take and integrate the build pipelines. I can light up what’s happening inside of Honeycomb by letting myself see and visualize the deploy pipeline. I can even create what I call a virtual trace by sending events from each build step to Honeycomb using a simple Perl command. To be able to give this level observability to the folks that are supporting the apps is becoming, as Brian said, non-optional.

23:15

Let me give you a little bit of a demo, so you can visualize exactly what I’m talking about, and perhaps we’ll have time for questions later. First, we talked about instrumentation, so I’m going to show you a little bit of that, go through a problem scenario like, somebody’s raising an issue on Twitter that things are not so well, how would we ever find that and dig through all of that, and what might it look like? Awesome. First, in the demo, you’re going to see some data, but as we talked about the idea of a beeline, it’s very simple. This is a screenshot from our docs. Just go grab the beeline, import it into your code.

We support many different languages. Once you put it in your code, boom, it starts tracking what’s going on. You can add wrappers for HTTP and it’ll capture requests, and then you can even do simple things like creating new spans, adding extra context, and making your data richer and richer. It is a common thing for Honeycomb customers to see what’s coming out of their code, and then to go back and add extra fields, add extra telemetry so observability gets better. Here’s a little bit of a demo. On the screen, you see the Honeycomb user interface. When I log in every day, it might seem a little different than other products because we’re focusing on me and my team.

I have some boards here, which stores some queries that I have but take a look at down here. We’ve got boards from my team. I now can see what other members of my team are working on, and I see an activity feed as well. I’ve been doing some queries, Molly has been doing some, Gene has been working on them. That may cause me to wonder, hey, what’s going on? Am I on call? Where do I pick up? This idea of recognizing teams exist and being able to allow me to learn from them is pretty cool. In my scenario, imagine there’s a company that sells concert tickets and they have an API service, and that API service, basically is used for developers to do all sorts of things. They have a monitoring tool and everything seems to be good, but there’s always some unknown, unknown, some service that might not be perfect for a particular customer or user.

Let’s see what that would be like to figure that out. The very first thing I’m going to do is, I’m just going to click right here, and it’s running a query. Before I jump ahead, let me give you a little tour of what you see, because Honeycomb is extremely fast to query. I’m going to slow down for a moment. First, on the top of the screen, instead of having a complex search language, we make it super easy to break down by fields in your data, perform mathematical calculations that will really give us visualizations of what’s happening, filter order, and limit. You almost need no training to figure that one out. On the right-hand side, you know up in your browser, how you have a browser history and you go back and forth? This is very similar, except for this is a history of all the queries that I’ve ran, and they’re instantly accessible and loaded.

They’re saved forever. Additionally, I can see the activity that all of my team members are doing. Just like I saw it on the feed on the homepage, I can see it as it’s happening. It makes it really awesome for developers and Ops folks that end up on-call to be able to see what everyone’s doing and continue the conversation. Now, if we look at this chart, this is just a simple count chart, looking at some HTTP front-end backend data for this particular service. There is nothing that is really looking like it jumps out at me as a problem. I have a marker on this timeline here. We’re taking data from the CICD pipeline, and maybe, at the end of that CodeShip or Jenkins pipeline, somebody dropped in a marker so we knew what build was running when it was deployed.

That’s great. I got a little bit of an idea of what’s happening in the external environment, but again, nothing looks particularly crazy here. The first thing I’m going to do is I’m going to add a breakdown by build ID and status code. Because I want to see if there are successes and failures, and I’d also like to see maybe what builds were running at a particular time. If I scroll down to this table, we see a lot of successes, which is great. Gosh, I hope we have a lot of successes, and we see a couple of different builds running in production, build 3150 and 3151. 3151 is a little newer, obviously fewer events, but I sort of wonder to myself, is there a really, a big difference between build 3150 and 3151? Not really.

We’ve got some HTTP 500 errors happening on both, so great, at least it’s not the code that was created. Great. Check that off. But now I have to find the needle in the haystack, find the unknown, unknown, and I’m going to do that with an awesome visualization called a heat map. We’re going to run a heat map against the duration in milliseconds or, another way to say this is, we’re going to create a latency heat map. This particular heat map right here, as I scroll down, we see a couple of spikes, and heat map really is just a histogram turned on its side. Now, we see from, on the left-hand side, from zero to 1.4 seconds are the range, and we have pretty common normal behavior. Maybe our SLO is less than one second, but we do see this odd spike happening in this area and that area.

29:09

Well, if I tell you to emit lots of fields with high cardinality, which makes it awesome to investigate, how do I know what to break down on? Well, we’ve created a feature called BubbleUp and BubbleUp is pretty awesome in my opinion. Now what BubbleUp will basically do, it allows me to wander through here, highlight this area, and BubbleUp will instantly do a statistical analysis on every single field within the dataset. Why this is awesome is because we can now see what’s in the baseline, which is outside of the selection and in the selection. When we’re troubleshooting an issue in production or anywhere else, we have to verify three things. Is there a problem, because sometimes people report problems and they don’t exist. Where is that problem? What part of our infrastructure is it in? And did it affect anybody?

Gosh, I hope not, but sometimes it does. So by drawing a box around that BubbleUp instantly analyzes every single field of my dataset, and I basically can answer that, so is there a problem? Yes, there are HTTP 500s a pile of them in the selection, not many outside, but a lot in the selection, so I’m going to add a breakdown by that. Well, we’ll filter on that. Let’s see, we got a couple of other fields here. There’s a field called name, that looks like it represents the API endpoints, and here’s a good observation. The export endpoint seems to be showing up in this particular dataset in this particular baseline at this time. That might be 

exactly where, so it will break down by endpoint.

Then while we’re at it, I was hoping no users were affected, but we have one. It looks like 71% of the events in this baseline in the selection contain things from a particular user ID. So we’ll break down by that. This becomes more complex as we start to look at status code, name, user ID, a lot more unique values that are happening. What did we find? Interestingly, we scroll down here, we see user 20109, hitting status code 500 on this export endpoint. I find it odd that there’s a high count. So maybe they’re hitting this endpoint way too much, but I need to dig a little bit deeper. The first thing I’m probably going to do is I can just share this on Slack to my team, if we’re going to share this, share to the demos team, a major issue with 20109.

Maybe that’s a high valued problem. Send that thing to Slack. Now my team can see that, and they’ll probably end up jumping in. Now I see a couple of spikes here. We can observe that that’s our user. We found the unknown, unknown that’s really difficult to find. Great. Can we see how the code was behaving though? This is where we start to see distributed traces as a result of using distributed tracing, let’s say through a Honeycomb beeline, and in this time range over the last six hours, I scroll down here, we see a table here that has information about the particular traces. We’ve got the longest traces that are appearing in this time range. I was hoping it was only happening to one user, but it looks like a couple of other users might be affected.

We could have a problem that we need to look at. I can mouse over here and see a summary of the spans, but let’s actually drill all the way in. This is a pretty unique function, a feature about Honeycomb versus other solutions where you can find the needle in the haystack and literally rip the needle open and see what’s inside. Now, what you’re seeing here is a waterfall chart of this request. At Honeycomb, we like to say, you should be able to see life from the perspective of the request. The user is out there, and he or she is performing that particular request, what was their real experience? If we step through this, this is the actual first request. We’re keeping all the raw fields associated with it and all the raw data, which is again earlier why I said to enrich your data with extra fields, send it with the request that it’s everywhere.

We can see it’s got a high duration. Over a second is way too much for this particular endpoint. Matter of fact, if you’ve ever hit something that’s taken a long time, most people try to reload and hit that more frequently and we end up often causing some latency. We have extra fields in there, what platform it was on, which service name it was, even what the hostname and the Amazon availabilities are. Extra fields that were brought along with. It hit this API endpoint, a rate limiter function was called, that doesn’t look too slow. Authentication service happened, at least that’s working great. Now there’s this service name called ticket backend. It’s doing a fetch ticket for export. That’s interesting.

This thing has taken 1.104 seconds, and we can see these queries, which we’ve also got the developers, probably using the beeline, emitted the query here so that we could actually see this. These queries are being run sequentially. Now, you might say to yourself, well, just make those queries all run in parallel, that’s possible, but sometimes you can’t necessarily change your code. It has to work the way it works, but is there something else that we could improve beforehand? I might say to engineering, hey, if somebody hammers this endpoint a lot for the ticket export, maybe we should change the rate limiter.

Now, this feedback loop has come all the way around where I can give engineering exactly what they need, if I’m on the frontline ops, to potentially make a change, grab a feature branch, do a test, deploy the thing in production, reevaluate it with Honeycomb, see if the latency is still happening, and then promote that thing maybe to your master branch.

35:43

Lastly, there’s a couple of things you might do. In this scenario, if we have a user that has a high degree of latency, I might create a trigger and I might take that trigger looking at latency, and I might send it to Slack or PagerDuty or using my favorite webhook compatible system. Also, I can look at the activity of other folks on my team, because I might not be the one that is solving this entire problem, but along the way, I get called in. We’ve tried to make Honeycomb extremely simple. We’ve also recommended things like instrumenting your build pipeline, taking markers, and having them appear on a time chart here as well so we can see the external environment. Then finally, start to look at a path towards observability.

You may be starting with the set of metrics and simple log events that you’re writing out to a file that maybe you’re sticking in a log search engine and spending a lot of time there. Take a look at instrumentation. It’s really the best way in the world of moving towards these black box serverless environments and all the containerized stuff that we’re doing with microservices. Understand what’s going on. Start to look at things by release. Be a software owner, where it’s not just ops that is dealing with things when life isn’t so great. Look at your code as it’s happening, allow yourself to do real-time introspective visualization. Distributed tracing is a must. Lastly, we should start to look at outliers and anomalies. You saw that in the demo with BubbleUp. It became so simple for me, maybe not being an expert on the code to figure out if there was a problem, who it happened to, and where it was.

Lastly, recognize that your team exists. Find tools like Honeycomb and the CloudBees’ suite, but let your team work together because continuous delivery isn’t just about writing code and shipping it. It’s having that full feedback loop and creating a great deal of trust in the services that you build. I just want to thank you for checking out what we’ve got going on and I just want to turn it over to Max.

Max:

All right. Thank you, Michael and Brian, for that presentation. We did have a few questions come in. A couple of people have asked about if this was recorded. Just to let everyone know, yes, we did record this. The recording will be sent out in the next 24, or 48 hours. So you’ll all receive that pretty soon. I’m digging through the collection right now. A brand new one came in that … Maybe not a brand new one, but one came in pretty recently. I think I want to start with, which is, can BubbleUp be used elsewhere than just the heat map? Because it seems very powerful.

Michael Wilde:

Yeah, that’s a great question. Well, first I would say you got to use heat maps. They’re really awesome because they give you a level of visibility that a regular old chart cannot because a heat map is basically a pile of histogram sitting on its side. That said, we’re doing some work to make the BubbleUp feature available to the other visualizations. We’re a very continuous delivery organization ourselves at Honeycomb. We create features super-fast, so look forward to more enhancements on that. Who knows? Maybe within the week or the next quarter.

Max:

All right. Thank you for that, Michael. Just to reiterate real quick as well, as Brian mentioned earlier and myself briefly, this is part of the CloudBees innovators program. We wanted to put on this webinar because we really see a value to the Honeycomb product. We’re getting a lot of questions on the Honeycomb product, which is very exciting to see. Before I really dive into them, Brian, if you’re around, I just want to ask you one quick question that came in, which is, what is the relationship between CloudBees and Jenkins?

Brian Dawson:

Okay. Yeah, that’s an interesting one. The relationship between CloudBees and Jenkins starts with the fact that CloudBees are the number one corporate contributor to the Jenkins project. We actually contribute upwards of, I think it is 85% of the code or constitute 85% of the commit activity on the Jenkins project. We as a company feel that it is a proven, powerful solution that we contribute for the betterment of the community. As well, CloudBees offers a number of solutions that are built on around or for Jenkins and the Jenkins community. As we discussed earlier, we have our product CloudBees score, which is a scalable CICD platform that is actually built on Jenkins, and soon to be Jenkins X technology.

41:11

Max:

All right. Thanks, Brian. Just for the audience, continue to ask these questions. A fair amount are coming in, no matter what platform, but in the meantime, I have a bit of a series of Honeycomb questions for you, Michael. The next one I’d like to ask is, is code instrumentation required to use Honeycomb?

Michael Wilde:

Yeah, that’s a great question. The answer would be no. We created a code instrumentation tool kit, SDK add-ons so that someone could get started with just using Honeycomb directly. However, we have agents that take data from log platforms. We have Lambda functions out there that can help take information from serverless environments. We work with almost every other log tool that captures data, and there’s a ton of information on our website at docs.honeycomb.io to allow you to get started. I was just talking to a customer that’s using Honeycomb for Amazon ALD logs. They are starting there, and maybe hoping to migrate or to evolve themselves to a more observable environment by doing code instrumentation. Not a requirement, but also when you do it.

Max:

All right. Hey, Brian, mind going on mute for me. Thank you. All right, so let’s look at this next one. Does Honeycomb integrate with Python or Pyon, I believe that’s a typo on Python, or does it come as a standalone app?

Michael Wilde:

Yeah. Good question. Honeycomb, what you saw today is a software as a service. Everything happens up there on Honeycomb, and getting data out of let’s say, if you were, for example, a Python developer, there’s a Python beeline that you can drop within your code or obviously take logs from elsewhere and send it straight up to Honeycomb as well. It’s really easy to get started. We’ve got a vibrant community that we support most modern languages as well.

Max:

All right. We’ve had a few different questions come in about Honeycomb’s … basically comparing and contrasting to some different tools. I want to consolidate that. How is Honeycomb different than other log search tools?

Michael Wilde:

Yeah, that’s a good question. Yeah. Thanks. I have an extensive background in the log search and analytics world. You can look up my history on LinkedIn if you’re interested. I wrote a blog post on Honeycomb’s blog about search first versus analytics first. What we find is that logs themselves, just log search rarely provides an understanding of what your service is doing. It might tell a little bit of metrics and a few other things, but we’re promoting the idea of observability as a better method with doing instrumentation. We want to be able to see what the exact paths your code is taking. It’s hard to do that in log search tools, not that they’re unnecessary.

Another thing is, we’d like to see people spend less time searching and more time visualizing. Everything you saw in my demo is analytics first, visualization first, maybe raw data and look at that later.

Max:

All right. I have two here that I think really go hand in hand, so I’m going to basically ask this as a two-parter for you. One part of this is if we’re getting started on microservices architecture, could we still benefit from Honeycomb? But the other part of this is, someone asked, how much data input is required to reach the same kind of info and details that you showed in the demo.

Michael Wilde:

Good question. Let’s see. First, getting started in microservices, super easy. If you happen to just be, let’s say you’re starting microservices, you’re throwing up some containers, you’re sticking them in a Kubernetes cluster, it’s super easy to drop in. We have a KADOS agent. You can drop it in and we’ll take everything that comes from the Kubernetes cluster and send it over to Honeycomb so you can see at least what’s coming out and then improve your idea of observability by saying, oh, you know, let’s actually do some instrumentation, let’s throw a beeline in there. It’s very common for folks to start at one end of the spectrum of observability, see the power and the speed of debugging stuff in production, and then go back to engineering and ask for more instrumentation, more telemetry, and better stuff. The second question was … well, what was the second question, Max?

46:24

Max:

Absolutely. The second question was how much work and time is required to get to the point where you were in the demo?

Michael Wilde:

Yeah. Thanks for asking. Recently, we did a distributed tracing workshop in San Francisco where we brought folks in and taught them how to do tracing, taught them how to do instrumentation, but it’s really simple. You don’t necessarily have to instrument your entire multi-tier architecture, sometimes just starting with bringing tracing into one part of your application, then lights up how that app is performing. It’s actually quite simple, as I said, let’s say you have node, Go Java Python, drop in a beeline, have it start doing request capture, have it create spans for you, add some extra fields. It’s really pretty simple. The idea of a beeline is to make that process simple without having to install a special agent, without having to spend a ton of money putting agents everywhere. A few lines of code and you’re off to the races.

One other thing that, on the screen here, if you’re interested in touching what I just did, at the top of Honeycomb’s website, or right there, it says honeycomb.io/play, you can go and experience a postmortem scenario that we actually have. You can touch the product to try it out and click on the same things that I did, get yourself an idea of how the thing works.

Max:

Thanks for that. I’m going to be posting that link into the chat for everyone so you don’t have to type it yourselves. Also, we’ve had a few questions come in about if there’s a free trial. There is a link for that on the screen as well. I’ll be putting that into the chat in just a moment as well for everyone. In the meantime, we just had a bunch of new questions come in. We’re getting some really, really good ones, so I’m trying to find what the right next step’s going to be. Here’s a good one. Are the metrics that you collect being gathered, and if so, where are they being stored?

Michael Wilde:

Let me see if I understand this right.

Max:

I may be reading it a little poorly. Let me re-ask that. What kind of metrics are being collected and where are they being stored?

Michael Wilde:

Awesome. Yeah, generally what our users and customers do is they are emitting metrics either from their code. Let’s say metrics meeting … the best scenario in Honeycomb is taking and making a structured event, a nice unit of work. That unit of work might look like some of the stuff I showed. Have a duration, have a timer, have a number of bytes, and all of that. Wherever those come out of, we capture structured data as it heads into Honeycomb. When it gets into Honeycomb, you have a choice of where to put it. Honeycomb’s organizational structure for data’s called a dataset. Even we at Honeycomb use our own product. We create different datasets for different reasons that might have retention or particular focus or maybe a developer that’s interested in working with just their part of the app, looks at everything from their datasets.

It ends up in the Honeycomb service securely transferred up there and you have control over the data and how long you’d like to keep it as well.

Max:

All right. Thank you for that. We’ve had a few two-parters come in, but they’re a little broken up, so I’m going to tackle those soon. If you did ask a two-part question, I apologize if I miss a portion of it when I get to it, but in the meantime, is Honeycomb SaaS only?

Michael Wilde:

It is. Often, the reason why folks ask that question is that they’ve used on-prem products and they may be limited in the ability to use the software as a service. If you are that type of organization where you have some limitations because you’re worried about a software service and your data, checkout Honeycomb’s secure tenancy. What secure tendency basically is, it’s a component you install on-prem, its job is to take data from wherever it originates, encrypt it, you keep the keys, it ends up in Honeycomb, encrypted at rest. Then when you access Honeycomb, the keys are pulled from your particular secure tenancy installation and everything is magically decrypted in the browser for you. That has made it so that folks that have high-security requirements, sometimes PHI PII requirements are allowed to use software as a service like Honeycomb.

51:39

Max:

All right. Thank you for that. Just so everyone knows, I’ve put a link in the chat about the secure tenancy. Everyone could just take a look if that’s something that you’re interested in looking at. All right, so the next one, it’s a bit of a long one, don’t be afraid to ask me to reread it, Michael. For super high cardinality use cases, is there any of the smart sampling possible at the instrumentation or agent level on the customer side, such as the amount of data that is pushed to Honeycomb can be limited, but without missing potentially valuable insights?

Michael Wilde:

That’s a great question. Sampling is something that’s often required at a massive scale. In my history as a blog search person in my past, I had customers that had a log infrastructure that was more expensive than the actual infrastructure when running the real app. There’s a scale point where sampling becomes a requirement. It also is a great thing to do just for the use of your data. Honeycomb’s agents, local software, serverless stuff, ALD plugins, it’s beelines and its SDKs, all support sampling, constant deterministic, or dynamic sampling. Sounds kind of complicated, but here’s an example. Let’s say I’m emitting telemetry from my production web app like maybe you saw you might do dynamic sampling.

Like, have something in your code that says, for every 100 HTTP 200 level, or HTTP successful events, let’s sample one out of every 50 events. Let’s sample one out of every one failure. So don’t sample the failures, sample the successes. The way Honeycomb approaches this is one, its product is sampling aware. When I see something on the chart, if there’s a sample rate attached to that event, the charts are correct as you’d expect them. Two, each event can represent itself or a pile of other events. You can emit a sample rate with every event, Honeycomb will read that and just do the right thing.

Max:

All right. Thank you for that. This is tacking onto an earlier question. Does Honeycomb integrate with test results, and can it do the same analysis on issues provided that the logs are there?

Michael Wilde:

It can, yes. I have a customer that uses it right now to analyze their test pipeline and the visualizations in Honeycomb are really impactful for them to see slow tests because in their world you have a super slow test, takes up more infrastructure and it just backs up the pipeline. A fairly common use case for Honeycomb.

Max:

All right. Thanks for that. Let’s see, what considerations should I take into account when creating a dataset? For example, is it possible to query across datasets?

Michael Wilde:

Yeah, that’s a great question. It is not possible to query across datasets at this particular time. We hear that requests from customers from time to time. If you needed to query across datasets because you have two different types of data, you can put those all in the same dataset. Our engine is designed to deal with lots of fields and a high volume of data. Some of the other considerations that one might make around creating a dataset are, how long they would like to keep data for, what types of folks would like to look at that data. While there’s nothing in the architecture that prevents one from putting everything in one dataset, sometimes it’s just a great way to functionally analyze your data. Now, when earlier in the demo, there was a board. You might think of it as a dashboard. We call them boards.

The boards itself, they’re a collection of queries and they can see across datasets. If you wanted to represent things from two different data sets for debugging a problem, a board would probably be the best way to do that.

56:27

Max:

All right. Thank you for that. Just real quick for the audience before I move on to the next question, there’s a few more here, though we do have four more minutes, so keep them coming if you have anything in mind. Those of you who have asked some specific questions about how does this compare to tool X or tool Y, we did kind of broadly tackle that earlier. If you’re looking for more information directly on the exact comparison, I’m about to put my email into the chat window, so please feel free to reach out to me after. They’re a little tough to answer on the air, just in case we don’t have all the information on hand about the particular tool, but if you are looking to know, we’ll make sure that you know. In the meantime, Michael, sometimes, I only realize I need more detail once I start sampling. Can I dynamically change the sampling parameters?

Michael Wilde:

Yeah, that’s a great question. It partly depends on how it is your sampling because sampling itself is done at the source. Let’s say, for example, you were using … we have a log capture agent called Honeytail, and you decided to … and you configured it to do sampling. You could easily send it, let’s say if you’re using a script to deploy that or using a container to deploy that, easily just relaunch that thing, change a different sample rate. If you are doing sampling in your code, you’re doing the deterministic or dynamic sampling or constant in your code and you needed a different value or a different sample rate or a different set of logic to figure that out, obviously, it’s your code. You can change it in there, boom, new feature branch, or maybe just redeploy. There’s a lot of different ways to approach that, and we have some cool stuff coming to help customers in that department even more.

Max:

All right. We are at two minutes left. I do want to be supportive, or respectful rather, of everyone’s time. Just to remind everyone, I’ve put my email into the chat, so if you have anything we didn’t get to, please let me know. We did get one question just now that I really liked, and I would like to ask, have this be the last question. Are there any features that Honeycomb supports that are relevant to host telemetry. For example, part of a CI pipeline consuming too many resources, then being able to be linked to Honeymarkers or something similar?

Michael Wilde:

Yeah. If I understand, there isn’t any reason why one couldn’t take any host-related telemetry and put it into Honeycomb. Just host monitoring is probably not the thing that most people do with Honeycomb, just because they do that with other products. But hey, we use Honeycomb in Honeycomb, and there’s a whole lot of different metrics that we put into our product so that we can get visibility on what’s happening with our service. Yes, integrate with build pipelines, push as many metrics out that are happening, stick on Honeycomb, and hey, there was no reason why you shouldn’t be able to see that in there as well.

Max:

All right. Thank you very much. I would like to take this time just to thank everyone for joining us today, sticking around, and asking just so many great questions. Thank you, of course, to Brian and Michael for presenting. I’m just putting into the chat one final thing for everyone, a few more links, the Honeycomb website as well as the blog. For anyone looking for even more information, we have plenty of it out there. The recording should be out in the next 24 to 48 hours, and beyond that, shoot me an email if we missed anything. I’m looking forward to talking to you all again real soon. Brian, Michael, thanks for presenting.

Michael Wilde:

Thanks.

Brian Dawson:

Thank you, Max. Thank you, Michael. Thank you, everybody.

If you see any typos in this text or have any questions, reach out to marketing@honeycomb.io.