Raw & Real Ep. 4
Build Better Builds

 

+ Transcript:

Kelly Gallamore [Manager, Demand Gen|Honeycomb]:

Hi, everybody. Welcome to today’s episode of Raw and Real. It’s 10:00 where I’m standing on the west coast. What time is it for you?

Pierre Tessier [Sales Engineer|Honeycomb]:

We have 1:00 p.m. on a gorgeous afternoon.

Kelly Gallamore: 

I know you were excited to have better weather. We’re going to really start our presentation about 10:02 this morning and just give folks a few minutes to sign in. We’re glad to have you here with us. In the meantime let me give you links that will be a little bit helpful. Here you are at Raw and Real. For those of you who are interested in live captions, if you want to take that link, you can follow along in your browser. Stacey thank you for being with us here today for live captioning. 

If you got an email, you should’ve gotten an email about 15 minutes before this episode started, so that will happen for you, you can find it in your email as well. We’re going to start in a couple of minutes. You are welcome to ask us questions at any time during the show. And if we don’t answer them while we’re doing our show today, we will pick them up and answer them at the end. So please, we love questions. We want to know what you’re going through. We go through this, because we think that a lot of people have asked us how Honeycomb helps Honeycomb build, develop, and maintain software and stay reliable and resilient and keep our customers happy.  So that’s really why “Raw and Real” exists. We show our dog food and our thinking and hope we give you a little tidbit that will help you in your process. It’s 10:02. There are captions for everybody. You can ask questions at any time, and they will pop through. I’m here today with Pierre. You know what? I think I like this slide better. 

2:13

We’re talking today about building better builds. How Honeycomb uses observability in the CI/CD process to learn code behavior, to learn when things were deployed, and learn how users are impacted and how to have pain free releases and build better code from the start.

Pierre Tessier, welcome to “Raw and Real.” How are you feeling this morning? 

Pierre Tessier: 

I’m feeling energized. I’m good. I’m ready to show people some cool stuff, really. Is that coffee good? 

Kelly Gallamore: 

Yep. My coffee is good. I’m catching up with you a little bit. Give a 30 second, what’s your background in software, and how did you get to Honeycomb?

Pierre Tessier: 

I have been in software and the data intelligence and monitoring space since 2000. I have been doing this for a while. I have been around those trying to do monitoring and observability. I have worked for many other competitors, and I joined Honeycomb earlier this year, because looking at Honeycomb, I realized that’s where I want to be. We do it right here. At least I feel like we try to do it right. But what I like most about us is how we are just so raw about things. We really just tell it how it is. That’s why I’m here. 

Kelly Gallamore: 

I really appreciate that, too. What I understand is that the build pipeline can be a tedious monster of a place. What is it like to build and deploy software without observability?

Pierre Tessier: 

It’s like pushing a button and closing your eyes and hoping it works. It’s like maybe doing… it’s being hopeful, right? In builds, a lot of things happen. Get dependencies, make sure tests work, deploy over here, grab other artifacts, and push. If something goes wrong, how do you know? If it’s taking too long, how do you know? Is a two-hour build okay? Is a two-minute build okay? Those are things that you don’t know if you’re not looking at your builds. As we get to microservices and continue the journey down CI/CD and down Agile methodologies, understanding your builds is very important, and very few organizations, if any, do it.

Kelly Gallamore: 

Gotcha. Let’s get right into it. You’ve got a story for me today about something that we improved recently. Could you tell me what’s going on here? 

Pierre Tessier: 

Let me share my screen first and show you what this is. 

Kelly Gallamore: 

Okay. 

Pierre Tessier: 

There we go. So we’re looking inside of dogfood right now of Honeycomb. For those who don’t know, dogfood is the Honeycomb platform that Honeycomb uses to look at the Honeycomb service. And part of the Honeycomb service is a page known as the Admin page that Honeycomb employees use. If you go on our little chat bubble inside of Honeycomb, and you chat up Molly from customer success, she will likely go to the Admin page to know what’s going on with you inside of Honeycomb. The Admin page is something that everybody cringes to go to because, as you can see here, the load times, 25, 30 seconds, 60 second load times, these are not uncommon for the Admin page.

Kelly Gallamore: 

Okay. 

Pierre Tessier: 

It would happen, and people would be great. I got to go to the Admin page. Nobody wants to do it. We’re looking at ten days’ worth of data. I want to turn on deployment markers. So these are, all of these vertical lines, every time we deploy code for Honeycomb. 

Kelly Gallamore: 

Oh my gosh. 

Pierre Tessier: 

We do a lot of deploys. 

Kelly Gallamore: 

That’s several times a day? 

Pierre Tessier: 

Right. And here’s the proof in the pudding. 

Kelly Gallamore:

So the markers are every time a deploy has happened in this time series, in this amount of time? 

Pierre Tessier: 

Yep. 

Kelly Gallamore: 

That makes sense. 

Pierre Tessier: 

We have some Thursday deploys going on. We deploy on Fridays. Sometimes even on Sundays. 

Kelly Gallamore: 

Right… haha! 

7:04

Pierre Tessier: 

The point is you should not be scared to deploy. And if you are observing your deployment area build pipeline, you’re less scared to do so because you know what’s going on and you can be alerted when things are happening. I want to zoom in on the time when we went from really slow to really fast to focus in on that. 

Now you will see now those markers became windows, and those are windows of time that it took us to refresh our fleet with new software. You can see that this one right here was the last time we have seen a high time. After that, they all got really, really low. It was this deploy, 201534. We can even link back to the actual build pipeline here. In this case, it’s circle CI. And you can see some steps in what’s going on. It’s kind of handy and useful.

Kelly Gallamore: 

It looks like Ian owns this, but anybody can go here and look and see what happened. You’re not dependent on just one person. Do I have that right? 

Pierre Tessier: 

You have got that very correct. So Ian owns this deploy, it was Nathan who wrote the code. I think Ian committed the changes into master there for us. 

Kelly Gallamore: 

Okay. 

Pierre Tessier: 

And we can see Nathan, my colleague, he opted to take this on. He clicked on the trace and said why is this slow? It probably didn’t take him more than 3 seconds to say oh, this is why it’s slow. Right? This is what observability in your application gives you. It allows you to see these things very quickly. He went from a heatmap to a trace and saw that we’re spending 16.5 seconds just doing this select statement alone. Took a look at the select statement, which is down here. You know, we’re a start-up, Kelly, but we have tech debt. The moment you commit that software, it becomes tech debt. That happens. 

How he fixed this is by removing a lot of stuff on that page that we no longer needed because we had other tools. We had other ways to get the information in a more efficient manner. He removed stuff and got rid of the tech debt and turned 20 second load times into 4 or 5 second load times. 

Kelly Gallamore: 

Did he have to ask 15 people before he removed stuff? 

Pierre Tessier: 

He made a post and took a screenshot of the page and shared it with us and everybody gave him thumbs up because we didn’t use it anymore. There is one area where we noted where to find other information. I think it was a single kind of Slack message that lasted about 30 minutes before he went off and did his thing. 

Kelly Gallamore: 

Okay. 

Pierre Tessier: 

But yeah. It’s great because it’s one of those stories where we use observability to understand the app. We talk about tech debt a lot, and we removed it. And everybody got happy. This is about observing your app. And the only thing I’m doing about builds here is I’m showing the build windows. I got cooler stuff to talk to you about in this area. I want to talk to you about our journey about building better builds. 

Kelly Gallamore: 

Yes, please. 

Pierre Tessier: 

A while ago, we noticed our builds were taking time. And we decided to throw some instrumentation in there and look at them. Before that, you would run a build and builds spit out a log in a console. We’re scrolling through them wondering if that’s right. And at the end of the day, you’re wondering if that’s appropriate. But you don’t have any context to get that. So we decided to instrument our builds. If you think about that, if you think about what a build is, you kick something off, it does a thing, and it could kick off multiple things at once. Maybe you’re doing your test, some lint stuff, and then you’re doing some actual compiles and you’re going off and deploying artifacts that we have compiled up. You can actually take that and change that model and represent it as a distributed trace as we see right here. 

Kelly Gallamore: 

Right in the tool. You just click through and there it is. 

Pierre Tessier: 

There it is. I could click on other ones here and show that each build is different. This is vastly different than what we looked at before. So we did this. And we started looking at data. And we started saying where are we spending the time in our builds? And we said okay. We’ll go test, to build. This is our front end. This is our back end. And we’re doing tests on the back end here. We looked and said yeah. Okay. That’s normal. So we did nothing. And sometimes doing nothing is an okay decision to do. We decided our builds were normal. Now, however, we had telemetry about our builds. 

This actually took place a long, long time ago, back in 2018. A year goes by. I’m looking at 12 months of data right now. We’re clear. It’s June 2019, so last year’s data. And you can see the overall build times are, generally speaking, they’re increasing. Okay. 

Kelly Gallamore: 

Okay. 

13:00

Pierre Tessier: 

Now what? Well, because we’re instrumenting the builds, we can do stuff with this. I want you to take this information and I want you to break this down by different steps in the build. And this is what we have here. And this is where it starts getting really interesting. So the same time window, but we’re looking at this as different steps. When I hover over go tests, go tests are getting longer. I don’t know what that is. This is JavaScript. Normally when we pick up JavaScript, I wave my hands in the air, and maybe I will curl up in a ball and hide somewhere. It’s JavaScript, right? 

But it allowed us to understand what was going on with our builds and where we are spending time with our builds. It allows us to go in there and say can we make this better now? Right? We were looking at the traces. Things are getting slower, we’re getting down to three, four, five minutes now. So we started to investigate what it is in our builds that is happening. And it turns out that the build system we were using, and nothing is wrong with it. It is a fine build system. But our needs were changing. And it was a very subsequential build system based on the apps. So we did move to a different build system, and that build system looks more like this, right? So a lot more kind of variability and everything there. Each step we’re still taking the same time or longer. But this build system uses containers instead. So this is actually that first year of data that stops right here. And this is the following year of data. So now I’m coming up to June 1 of this year. Please don’t mind this. Somebody slipped up and nobody paid attention and we had a small boo boo there. But why not?

Kelly Gallamore: 

That happens everywhere.

Pierre Tessier: 

But our builds kept on climbing, and clearly we changed build systems and we have crazy variability. And that is because the new build system is containers, and it allows us to run a lot of tasks in parallel. So with that, we can then not have to worry about individual things taking too long. Before go test was taking forever and causing everything to slow down. So I will click on a random dot. Clicking on a heatmap in Honeycomb brings you to a trace. 

Kelly Gallamore: 

You don’t lose any time. You can stay in your thought, in your moment. 

Pierre Tessier: 

What I like about that is you don’t search for traces. Click on the one you want. You can see which one you want in a chart. Click on that one. 

Kelly Gallamore: 

Okay. 

Pierre Tessier: 

So here, like this one here is our go test. Our go test is run in parallel now. So we’re doing a lot more simultaneous things at once. And it can become four times longer before we would have to be concerned with it pushing over our entire build.

Kelly Gallamore: 

Okay. 

Pierre Tessier: 

And this is part of what we got and captured by instrumenting our build. So in every step of our build pipeline now, we send a span to Honeycomb. We now get this and we can watch this and we actually have SLOs now on this as well. We can sit here and say do we need to work on an aspect of our build? Do we need to do something else or look at it? This is what I love about all of this. We took these different things. We took what we knew about distributed tracing. We took that mental model, and we applied it to a delivery build pipeline, and it works really well. And with that, you now have rich information. You can use Honeycomb with this rich information and do what you need to do.

Deploy is the only step that depends on everything else to complete. Once everything else is completed, let me collapse this up a little bit. Once we have these other steps completed, we can go ahead and do the deploy. We can run the steps together. Right? It’s taken that long to build pipeline and put it inside of observability. We did go into a little more detail in how we did this. We blogged about it, tweeted about it, and made a presentation last year at QCon.

Kelly Gallamore: 

Attachments come with these shows, and I have a link to Ben Hartshorne’s talk at QCon last year, so you can get more in the story. 

18:13

Pierre Tessier: 

To cover the highlight, builds are effectively… I don’t want to dumb them down, but they’re running a lot of BASH commands. This is what they do. We took that concept and we said well, you could kind of wrap every step with a BASH wrapper. We wrote a Golang wrapper. And as we executed each step, we were starting a trace, I’m sorry, starting a span into Honeycomb, and when that step was finished, we would close off the span with whatever information we needed. And from that, we now have this beautiful picture. It is genius, really. How do we make this better? Distribute tracing. 

That’s what I want to show and focus on and highlight and chat about here. All of these things, when we talk about doing better systems and being better at observing, it’s not just your application that we have to matter and care about. Sometimes it’s observing the systems and tooling that supports our application as well. And certainly, as we continue to do more deploys, we see in Honeycomb, for example when I come back to this screen over here, and I look a day prior. We do a lot of deploys. And because of this, we needed to understand our builds. Organizations are all going down that path as well. Pretty cool. 

Kelly Gallamore:

Since you’re in here, let me ask you about this. This change on our admin page, what was the effect that we saw as a result of it working better? 

Pierre Tessier: 

Happy employees. 

Kelly Gallamore: 

I know it helped Molly and Irving on the customer success team. Can you tell me how that might help them? 

Pierre Tessier: 

Say you call in, and we did a recent pricing change. And a lot of people were concerned about were they overusing and how can they reduce usage. We have to show what is their usage day to day, and sometimes we have to adjust those limits as well. Maybe you’re saying hey, we want to send more data to you. We want to up our commitment to Honeycomb or reduce it. That’s an Admin page function, and it’s the only way we could get there to do it. Or you could send us a note on Honeycomb Pollinators. We have your email address, but that doesn’t tell us much more. We need to know what team you belong to. That’s the Admin page. It’s looking up, associating emails to teams, or vice versa. And it’s understanding limits that are imposed on all of these teams and if we need to adjust those limits based on usage. 

Kelly Gallamore: 

Gotcha. And I can see how really being able to dig into what’s happening to users and what’s happening to specific users can also help the engineering team overall make improvements based on what they are going through because we can see it. 

Let me ask you this question, and you can use Honeycomb to show me this or not. I miss seeing your face. So you’re welcome to come back. What can you tell me about how getting observability into your build pipeline, the instrumentation process for that, is that hard? 

Pierre Tessier: 

It’s a BASH script. There are two things you should be doing first off when we’re talking about getting observability into your pipeline. The first is put the deploy markers down on your screens. These are important, right? You need to know when you do a deploy, and so do your other users. So definitely putting these deploy markers onto your main area inside of Honeycomb is important. We have a CLI to do this, we call it Honey Marker. It’s easy to use inside of a build pipeline. We’re going to talk about that at a different webinar tomorrow. 

Kelly Gallamore: 

Oh, great. 

22:37

Pierre Tessier: 

The second part is instrumenting the actual build pipeline itself to generate the data. That’s a little bit more work because you need to put a command before and after every step. You want to do this in a wrapper command. That’s what we did. If you watch Ben’s presentation, he will tell you what the code looks like and give it to you. It’s really not that difficult at all. But that would be how we would want you to approach that. So it takes a little bit more. It’s a lot you got to get in there and get into the mindset, and every time you build a new build pipeline, you have to add the instrumentation to it. And at the end of the day you will be thankful you did that because if you’re spending six minutes a build, that’s compute money you’re spending to build when you could be doing three-minute builds. If you cut your compute cost in half, that’s a significant saving. 

Kelly Gallamore: 

It sounds like a really good best practice. It reminds me of professional restaurants that have to produce very high quality at scale for a lot of people in a short amount of time. And one of the easiest things they can do to help communication across the team is label everything in the fridge with what it is and the date, because not everybody sees everything the same.  So I can see how that best practice of leaving good information for yourself, where your teammates can find it, setting deploy markers. These breadcrumbs help the whole team come together. What did I learn today? I learned that using markers in your build pipeline can show where deploys happen, which is really important for understanding how changes can affect code behavior in your system. Did I get that right?

Pierre Tessier: 

That is absolutely true. 

Kelly Gallamore: 

Awesome. I also understand in the scenario you taught me with the Admin page, I understand that talking about tech debt on teams can be hard conversations, awkward conversations. Who owns it? Who doesn’t? Why is it still here? Legacy things that you’re like no, they’re not we’re just going to sweep it over here. And with observability, it seems like you can understand how it’s going to affect your systems, and it makes it much less scary and much easier to manage. Did I learn that right?

Pierre Tessier: 

You did learn that right.

Kelly Gallamore: 

Okay. 

Pierre Tessier: 

I think having a good culture in your team as well, where opening communication is fine. Sometimes tech debt, it was something that somebody wrote and it was their baby from a long time ago and they’re really proud of it. But do we still need that thing that you’re really proud of?

Kelly Gallamore: 

And if you can see how it’s behaving, you can make a data-driven decision about whether or not you keep it as is, you improve it now, you improve it later, or is this part of the section where this is behaving just fine for now. All of these situations are going on and we can evaluate that when we’re making bigger changes. You can see it so you can make better decisions. 

I can also see how you’re just showing us one little example of where some time that it takes latency in the Admin page makes it kind of a pain in the butt for our team, but at scale, you’re talking about major costs to the business. And when you can have that insight and imagine all of these efficiencies and optimizations at scale, you’re talking about more reliable software and a better experience for your customers. So I can see how this is really, really important. 

Pierre Tessier: 

I want to hit back on a better experience for customers. If every time you hit up Molly or Irving in customer success for a question about what’s going on with your team, and it takes them an extra 30 seconds to a minute to get your answer versus getting it in 10 seconds, that’s our customers who have a better experience because we made this change, even though this change only affects internal employees. 

Kelly Gallamore: 

Gotcha. 

Pierre Tessier: 

It’s things like that for sure. 

Kelly Gallamore: 

So with observability in the build pipeline, we can build faster, we have more flexibility, and more people can know overall. So there’s more autonomy in the process.

Pierre Tessier: 

Uh-huh. 

Kelly Gallamore: 

Pierre, that’s our time for today. I was going to wrap it up. Do you have words of wisdom for the folks at home? 

Pierre Tessier: 

Instrument everything. Everything! I really mean it. I instrument the fans on my server at home. 

Kelly Gallamore: 

Label everything in the fridge! 

[ Laughter ] 

Makes everything easier. Put a sticky note on everything. That’s what this feels like. Bread crumbs everywhere really just helps to lift everybody up and brings us all together to make it all better. Cool! Let me just show you guys a little bit of wrap up here. Thank you so much for joining us today, everyone. Nothing awkward is happening here… 

[ Laughter ] 

If you have any questions, you know where to find us. Reach out to team@Honeycomb.io. For those of you listening now or any time, you can catch this on-demand later. You will get an email with a link to see this on-demand at any time. If this episode helps you, please pass it along to your colleagues and peers. And then you can fill out the survey that we have up and running. What really helps us is knowing what you would like to learn. A lot of the topics we have covered so far are things that people have asked us to dig into, or it’s questions that our engineers find themselves answering all the time. Our goal is to break it down and help everybody. If you write into the survey, there’s a box for what else you would like to learn and we’ll cover it. We’ll break it down into smaller chunks because I want everybody’s life in engineering to be better. 

You can check us out if you’re still new to Honeycomb, as you know, you can play with us in our sandbox at play.Honeycomb.io. And if you haven’t yet signed up for a trial, you can do it that way. We also have a free plan. If you’re still wondering if you’re ready to get on the train, come play around yourself, and we’ll see you that way. 

Pierre, thank you so much for joining us today. I really appreciate it. 

Pierre Tessier: 

This is fun. I love this.

Kelly Gallamore:

Will you come back and do another one? 

Pierre Tessier: 

Maybe. Of course, I will. 

Kelly Gallamore: 

I will see if I can talk you into it. Thank you so much. We’re so glad to have you and look forward to hearing from you. Have a great rest of your week. 

Pierre Tessier: 

Bye. 

Kelly Gallamore: 

Bye.

If you see any typos in this text or have any questions, reach out to marketing@honeycomb.io.