Raw & Real Ep 5
All Aboard
Bring Your Team Together

 

+ Transcript:

Kelly Gallamore [Manager, Demand Gen|Honeycomb]:

Hello, everyone. Welcome to Raw & Real. It’s good to have you here today. Irving, thank you for joining me this morning. 

Irving Popovetsky [Director of Customer Engineering|Honeycomb]: 

Good morning. 

Kelly Gallamore:

I’m going to give everybody a couple of moments to sign in. I am going to pour my tea really fast and not tell everybody that I spilled a whole bunch of water while I was making this last pot before coming.  

We will just give everyone a couple of minutes to finish signing in, so we’ll start the real deal promptly at 10:02, but you’re at Raw & Real, and we’re glad to have you here today. 

Irving Popovetsky: 

Good morning, West Coast folks. Good noon, central time. Good afternoon, East Coast. 

Kelly Gallamore: 

Good Morrow to the Australians and New Zealanders and anyone from that particular future who’s joining us today. 

Irving Popovetsky: 

Yeah. Good evening, India. 

Kelly Gallamore: 

Excellent. Excellent. If you’re just joining us, you’re at Raw & Real. We’ll start the show promptly soon, but I do want to let you know that you’re at Raw & Real. It’s our short and sweet product demo series, how Honeycomb uses Honeycomb. 

We do have live captions for this series. So if you’re interested in having live captions play along at the same time, this is the StreamText link to get you in. The ending point is Honeycomb.io captions, for those of you who are particular. I just want to thank Kimber of Breaking Barriers who is here today to do live captions alongside the show.

We’ll get started right now. You’re at Raw & Real. Make sure you’re in the right place. You’re definitely not in the wrong place. This is Episode 5, All Aboard: Bring Your Team Together. It’s our short and tweet product demo series, how Honeycomb uses Honeycomb. We’ll start today with a little hello and some explanations. I’m here with the Director of Customer Engineering, Irving Popovetsky. 

Irving Popovetsky: 

Hello. 

Kelly Gallamore: 

My teammate here at the Hive. Thank you for joining us here today. 

Irving Popovetsky: 

My pleasure. 

Kelly Gallamore: 

We’re going to talk for a few minutes. Here, let me open up and see if anyone is talking yet. There we go. Great. I don’t see anything yet. All right. Everything I read here is just fine. I want to let you know you’re welcome to ask questions at any time. The question box is below your screen. We can see it. If we don’t cover your specific curiosity in the presentation, we’ll get to them at the end, but you can ask them at any time. Irving, can you tell me a little bit about your background and why you’re passionate about observability. 

Irving Popovetsky:

I started back in 1997. I’ve done a very wide variety of things around the industry from systems engineering, software, security, consulting, and more recently, customer success. Now, I started at Honeycomb in December as the head of Customer Success. What really gets me out of bed in the morning is seeing people run their applications in production smoothly, drama free, well understood, and with confidence. The truth is that I’m only successful when you’re successful in doing that. 

Kelly Gallamore: 

After my own heart. I really appreciate it. What I do know is that tool fatigue is real. I know that making decisions about solutions, making decisions about things that will, hopefully, make everything better for your engineering team so you can make a product that you care about so you can do work that’s valuable, making decisions, getting locked into a product, this is not easy. I can feel the despair that can come from particularly trying to figure out how to get started with a new tool, how to get started with a different solution. I kind of feel like one barrier can really be getting buy-in from the other teammates. Getting set up but working alone in a vacuum is not DevOps. DevOps is a team sport. 

Irving Popovetsky: 

That’s right. 

Kelly Gallamore: 

Irving, you’re going to show us a few things that will make it easier for people to get their teams on board, correct? 

Irving Popovetsky: 

That’s right. Before we dive into that, I would like to take a quick survey. You can raise your hand to yourself or you can put a question in the box. 

Kelly Gallamore:

Yeah, put it in the box. Let me see it. 

Irving Popovetsky: 

If you consider yourself an internal champion of Honeycomb. You checked out Honeycomb. You saw Charity’s tweets. You added a beeline to your application to instrument it. You set up open telemetry to forward to Honeycomb, and you got really, really excited about what you saw, and now you’re trying to figure out, how do you get all your teammates onboard? How do you justify observability as an effort and as a cultural idea within your company to your management? If that’s you, you are in the right place. 

(Laughter) 

Kelly Gallamore: 

I hear it. We hear this really, really often. Irving, what can we show to make this easier for people? 

Irving Popovetsky:

Absolutely. Let me share my screen and show you a few things. All right. Wonderful. So here we are. This is Honeycomb. You should all be familiar. This is kind of where it drops you off when you start in on Honeycomb. But I want to first talk to you about a couple of very quick things that I recommend every user of Honeycomb do just right out of the box right when you get started. The first one is setting up single sign-on. That’s right here in your team settings. In your team settings, don’t worry about the API key. I’m going to clean that up after we’re done here. This is just a demo. 

Right here is the single sign-on option. When you enable this, this is super easy to set up. I’m going to turn it on right now. I know that my company uses Google Authentication, which is just a really easy click through. I’m going to hit convert, and I’m going to sign in, which just double checks that it works. And now single sign-on is turned on. And so what that means is that anybody who logs into Honeycomb and clicks the Google Auth button and their domain matches my domain, they’re going to automatically be added to this team. And that’s wonderful because, as you’re trying to get everybody onboarded on to a new tool, always the headache that comes up right after the bat is: How do I get in? Do you need to invite me? Do you need a password? I forget my password. Can you reset it? All these things. I don’t know anybody that doesn’t have some kind of single sign-on system today. This just makes it super easy to get started and get everyone there. 

The other way is you can share this link. And so there’s a special SSO sign up link that you can then drop to your team and say, Hey, everybody. Hit this link and get signed in. So, for example, here I am. This is my little stealth-mode startup that I work for, Yoyodyne Propulsion Systems. We’re dabbling in interdimensional travel, still working on our proof of concepts. I’ve got that link, and so I can paste it right here and say, you know, Hey, everybody, Hey, everyone, buddy.   

(Laughter) 

Kelly Gallamore:

The other person at the company right now. 

Irving Popovetsky: 

That’s right. We’re doing a demo of Honeycomb tomorrow. Can I type today? Awesome. That must mean a lot of people are watching. Hit signed. And then I just drop that link, and everybody else who’s in my company can click that and get signed in. So that way maybe I sent that out as an email as well or put that in a calendar reminder. Everybody gets signed in to Honeycomb. We don’t charge per user because we believe that you should invite everybody that you can into Honeycomb. That’s super important. 

Kelly Gallamore: 

Well, I think observability is most powerful when teams are using it. We’re a tool built for teams. Yeah. And what I really appreciate about this is, as I think about all the marketing tools that my teammates have to sign into, that we have to sign into all the time to try and make the best work happen, every bridge, every moment that we can take a little bit of friction away, I can get people to stop being frustrated about how to get in and start focusing on the problem. Those seconds, those feelings, those irritations, they add up over time. So I really appreciate how two or three different ways, you can just have people get right in and make it easier. This is a button that six months down the road if you had just known about it when you were setting things up, it would have made getting buy in a lot easier. So I really appreciate that feeling. 

Irving Popovetsky: 

Exactly. Also, if you have a security team, they will feel better knowing that SSO is enabled. SSO is a feature that we have for all of our pro and enterprise-level teams. Now, as my teammates start trickling in, I can see them here. So Kelly signed in. I can see her in my list of team members. This way, I can see who’s coming in, and I can choose to promote anybody to an owner. That means they will get additional notifications. For example, usage notifications, if there’s quota alarms or anything else. Although, you can change this and pare this email list down at any time. 

Kelly Gallamore: 

Okay. 

Irving Popovetsky: 

And now, the next thing, this is something I highly, highly recommend. Everybody uses Slack, right? Setting up our Honeycomb Slack integration, if you haven’t set it up, it’s going to change your life. It’s so good. It ties everything in so nicely. Now, I already have this set up. As you can see here, I’ve got this in my list of integrations. I can also add other integrations as well. For example, I could add an integration PagerDuty or we support a generic webhook. So you can go and fire integration into anything that you need to. 

Now, why do I stress this point about integration so much? Because, in a past life, I did consulting around knowledge systems, and back when Wikis were pretty new, I helped install and set up and advocate for Wikis. You probably have Wikis in your org, probably lots of them, but do you remember how they got started? Somebody had to not just set them up but seed the content and continually bring the conversation back to that Wiki, right? The Wiki wasn’t super useful on day one when it had no articles. Right? But, as it started building up articles, and people knew that they could go to that Wiki to find knowledge or add their knowledge for others, that’s when it really started to pick up steam and really move under its own power. And so that meant that somebody had to actively integrate that Wiki into all the other conversations, right, and make sure to get the knowledge out of email and chats and et cetera and turn those things into Wiki pages and do that until it became second nature for your team. 

Now, Honeycomb is interesting because it’s also in some ways a knowledge system because there’s all this collective team knowledge about how your app works in production that you want to integrate with all of your other sources. So this is something that I recommend is connect Honeycomb with your other workflows and knowledge streams and conversations, and then it will become second nature for all of your teammates to go back to Honeycomb. 

So a great example of that with our Slack integration that all of our users and customers love is the Slack unfurling. So what you can do is you can take any graph, and just here’s one I’m picking at random. I can either hit this “share” button and choose to drop this into a channel. In this case, I’ll drop it into the Yoyodyne channel and say, Hey, this looks interesting. And just drop that in there. You will see right here in Slack, maybe, it will pop in there. There it goes. It just took a second. 

So this is one thing I can do or I can just take this link. These are our permalinks. So that means these queries that you run exist forever, even after the data has aged out. You can go back to those. I can just drop that in as well and say, you know, Hey, Kelly, did you see this? And then just drop that same permalink, and it will also unfurl that and show a picture of the graph as well as some useful information about what kind of query this was. 

Kelly Gallamore: 

I, again, really appreciate that. Like, any friction that you can relieve to get people directly to the conversation can reduce any kind of context switching. It reduces that overhead so that people can focus more clearly on the interesting thing that’s in front of them, be it an issue or an interesting behavior that’s not an issue but still shows you how something works in your particular system. 

And I also appreciate this because it really, like, you can help set best practices for other people in the team. This is the social side of the culture that we talk about, you know, not just having the right tool, but having some of the best practices that bring everybody together. Something like this can say, hey, let’s get right to the conversation. See how I think. See how we think over here, and it can help new teammates. I don’t just mean junior engineers. I’m talking about people who are new to the team who are coming into a group, as you build your culture together. This makes it so much easier. 

Irving Popovetsky: 

Yeah, absolutely. And, I mean, I can’t even count the times that inside of Honeycomb, we’re troubleshooting an incident, and there’s this beautiful thing that’s happening in the channel where the incident is being discussed where people are finding new information, making comments about it, and just dropping Honeycomb links right there; and then somebody else can just grab that link, and they can iterate on that very quickly and say, You know, this happens to me all the time. Or we’ll say, hey, did you see this? It’s super interesting. And then somebody else will go, they will iterate this and go just on the front end, and then they will group by some other value like error, right, and, let’s see, and then they will go, Yeah, but, hey, did you see this? And they’ll drop this link in, and then the conversation continues. 

And it’s not only useful right there in the moment of the incident, but you can actually go back in time and maybe six months down the line, the same kind of error, the same kind of situation is happening, and you got paged. It’s two in the morning. You can go back to that conversation in a couple of ways. One is you can just look for it in that Slack channel and see who did what, who queried what, and you can go right back to those. Or you can also find that in your team activity, and that also goes back to the beginning of time. 

You can keep loading more and more queries and see all the queries that everybody on your team has run. So maybe the expert on this one saying you want to go see what queries they ran. So all of this is there, and it is kind of like a knowledge system in its own way. 

Kelly Gallamore: 

I actually really appreciate that you say that because when you put something in a channel where many people can see, you know, your focus might be about, is this customer happy, you know, what is going on with this individual user or across this individual dataset that we see? You and Molly are really focused on that. Whereas Megan and Danyel, who are focused on product and design, might go, oh, we might see something in here that we can make it just a little bit easier down the road. This isn’t, I’m bringing down fire, but that doesn’t mean we couldn’t do something that makes it a little different. Where, you know, the back end team might find something that’s like, ah, this is the thing that we tried that did or didn’t work. And we can either expand it or take it away. I love that you can bring about and have many different people’s perspectives based on what they care about for their job. 

Irving Popovetsky: 

Exactly. And that’s a great segue to talk about concerns for a minute because, yeah, as head of customer success, my concern is: Are all of our customers having a great experience with Honeycomb? If not, I want that called out to me. And that brings us to another thing which is very important, which is instead of me just dropping links into Slack and saying, hey, did you see what I found? We can have Honeycomb do that for us in a couple of ways. So one is with triggers. Triggers are an alerting mechanism inside of Honeycomb that lets you automatically send a message based on a certain condition. Now, what’s great about this is not only can you have a great name for this, but you can also add to the description. And one thing I advocate for is to make sure the description of all of your triggers, not just has more details about why that trigger is and why did it fire but also link you straight off to a run book that tells you, hey, this is what you should do when you see this. And then you can even add additional context like, you know, If you’re stuck, ask Kelly what to do. That way, we can also make sure that you know who to talk to. Again, this is all about having empathy for that person who has been paged at 2:00 in the morning. They’re not going to be as, their memory is not going to be as great as it is at 9:00 a.m., and, you know, they’re going to need help. So we want to make sure that we give them as much help as possible. 

Kelly Gallamore:

I mean, if we’re talking about instrumenting your codes that you can understand behavior, this is an excellent type of, you know, empathetic breadcrumb that you can leave for your teammates so that you’re not the only hero, so that you’re not the only one that has context. And that means different experiences and different energies can come to the table to help solve issues and make things improve over time. I love how easy it is to just iterate and make this better. 

Irving Popovetsky: 

Totally. Totally. And there’s one more thing here that I absolutely love. Not only can I set up recipients that are Slack and email and Pager Duty, and you want these to go to all of the places where people look for alerts, right, because they’re going to have special filters on emails and on channels and whatnot to look for that. But we can also automatically have a trigger at a marker to our dataset, which, I think, is really neat because as you’re scrolling through past data, we can decorate that data with additional bits of information, like deployments that happen and markers for past incidents. And that’s very, very valuable when you’re looking at that and scratching your head and going, why is that spike there? Can anything explain to me what’s there? So I think that part is really useful. Now, one more thing. On top of triggers that I think is super useful, Kelly, let me ask you, are you happy with the amounts of alerts you get from day to day with monitoring systems? 

Kelly Gallamore: 

Let me put it this way. We are a very busy team, the whole Honeycomb team is but especially our marketing team. Anything that helps me focus on the issues at hand, I try to say no to a lot of the other stuff. Any alert that takes me out of it, I want to make sure that it’s worth it so that I’m focusing on the things that are most impactful. 

Irving Popovetsky: 

Yeah, absolutely. I agree. And that’s where our SLOs are also super, super valuable. SLOs are Service Level Objectives. It’s a technical and business agreement between engineering teams and the rest of the organization. I think what’s really amazing about our SLO feature is not only, do you get all of this great information about your error budget and how it’s burning down, and, of course, it’s integrated with our BubbleUp. So we cannot only see here that we’re inside of SLO compliance, but we can also see, hey, there’s these two customers that are actually having a really bad experience right now even though all of our metrics are super high. System uptime is five nines, everything is amazing. But, like, somebody is not having a great time, and we should look at that and understand it. And this is where I would challenge you all to try and experiment, which is maybe just a thought experiment or you could be a real experiment, which is what if you disabled all of your alerts for a week and started from scratch just based on SLOs? 

Kelly Gallamore: 

I mean, you’re talking about burning it all down. 

Irving Popovetsky: 

Burn it all down, right, exactly. 

(Laughter) 

Kelly Gallamore:

My heart goes in two ways. I hear that, and I’m like (gasp) oh, freedom. That sounds like such a good idea. And then another part of me goes, I want to hold onto the control I have because this is like, this is what we know. This is how we do it. So I’m kind of of two minds there. 

Irving Popovetsky: 

What’s amazing about SLOs versus other alerting mechanisms is two things. Number one is what I just called out, that you can have high uptime, all the stats are green, but you can have individual customers or groups of customers who are having a terrible experience that you wouldn’t know about otherwise, and that’s super valuable, especially from a support and customer success perspective. But you can also go the other way where you can have alerts firing on all kinds of things. My SQL database CPU is high, and blah, blah, blah, and disk latency, but, in fact, all your customers are having a great experience. So is that really signal or just noise at that point? 

To bring it back to sending alerts, within our SLO feature, we can define various exhaustion times and windows. These are exhaustion times for your error budget. So what we normally set up is a 16 or 24 hour exhaustion time which says, hey, if you see this and it’s the middle of the night, or you don’t see it, that’s fine. Don’t page anyone on it. Take a look at it in the morning. That’s completely okay.  And then a 4-hour SLO exhaustion time means this should page somebody because, for some reason, your error rate or your latency or whatever combination of factors that you choose to believe are critical for user experience, your error budget for those is burning down so quickly that you need to wake someone up and to investigate that. And that is the most meaningful alert that you could wake someone up for. SLOs, have them go into Slack. Just to show you what these look like, I have a couple of these as examples. Here are triggers, which are firing for something very specific. And then here are some SLOs that I just tickled them and made them fire, in which case warned me that both my 4-hour and 16-hour error budgets were going to burn. 

Kelly Gallamore: 

Again, these are triggers on your Service Level Objectives, which is what you agree, as a team, to hit, hopefully even before your customers even have any issues so that you have a proactive observation. This is kind of what we mean by observability is understanding the behavior of your system based on the outputs, based on the outcomes, and measuring this way kind of allows you to see it from your customer’s perspective in each individual case without really slowing anything down. We do talk about SLOs in a couple of episodes. I will have a link to the SLO series that will come out after this webcast so we can dig into that a little bit more in other series. That’s awesome, Irving. Thank you. 

Irving Popovetsky: 

My pleasure. Okay. A few other quick things I will show you. In the interest of time, I will kind of breeze through these, but if you want to see in more detail, absolutely ask about it. The first one is Boards. So a lot of companies have dashboards, and one thing that Charity talks about is how dashboards can become artifacts of past incidents. Right? We had a thing that happened, and we don’t want that to ever happen again, so we’re going to stick it on a dashboard. And we have a very different philosophy of boards and what they should mean, which is if you take all that away and let’s say that you work for an organization where you believe in continuous improvement, and so that past incident, you don’t need to maintain an artifact for that on your dashboard until your dashboard looks like some kind of nuclear control room. Instead, we believe boards should be starting off points for areas of concern. I put a couple here just as an example. I like to think of these as what we have are starter queries for a particular service or a group of services or, like, what person cares about. So in our Honeycomb internal, I will have a “what Irving cares about” board, which will show a few critical things I’m always watching out for. 

(Overlapping speakers) 

Kelly Gallamore: 

You go ahead. 

Irving Popovetsky: 

I was just going to say if I pop into these, I can see the specific charts and go and mess with their time range right here, or I can click into any one of these and then start iterating on that and see. But this should give anybody a springboard to see, what does this data team care about? You know, they care about things like MySQL latency or the error rate of Redis and you can pop right in here and see a fun drill-down of, hey, like, something is up with these two SQL queries. They’re much, much slower than everything else. 

Kelly Gallamore: 

That’s a really pretty chart. I’m trying not to get excited about all the colors. 

Irving Popovetsky: 

It’s a colorful one. Yeah. And then in here, I’ve got markers in here to help illustrate certain points. For example, here was a deployment that happened, and right after that deployment, MySQL latency alert fired. That’s really interesting. We should look at that. These exist, and they exist as links that you can then click on, and it will link you off to a CI system or something like that. 

Kelly Gallamore: 

Gotcha. I actually have a question in the box. Maybe this is a good time to ask it because I see you talking about markers here.  Someone asked: Can I create a marker to indicate I’ve started a chaos experiment? 

Irving Popovetsky: 

Absolutely.

Kelly Gallamore: 

Yeah. Is that right?

Irving Popovetsky: 

That’s right. So we’ve got a command-line tool called honeymarker, which you can use to automatically add markers to any dataset you want from any tool that you want. So if you have an automated chaos experiment tool or your deployment tool or incident management tool, all of those can be configured to add markers automatically with both a start and endpoint or you can just click and drag anywhere on any chart in a dataset. And instead of hitting this zoom button, there’s another button below that which took me a while to figure out what it is, but this is our add marker option here. So I can say, like, that time Kelly logged in as root, I can drop a URL here, if I wanted, as well, but then I can just add this as a marker, and it’s going to exist there, and it’s going to exist on every query in our dataset that looks at this time range. 

Kelly Gallamore: 

So this behavior right here is kind of what I call, this is how we have nice things. I mean, taking time to go ask permission to do stuff can make it hard to move quickly. And It is good to get buy-in from your teammates, especially about, like, things that are going to happen, but if you leave a note here, it’s like a flag in the sand that says, this happened. So I’m helping the next person who comes along either learn something that’s valuable for this dataset or shows them a pattern or anything that can last forever. It can be good for two days down the road or six months or three years down the road, however much time you have. Okay. We have about five minutes here, Irving. 

Irving Popovetsky: 

Last little thing I wanna show you, and again this is a team sport thing. Within the details tab of any dataset, you can see all of the definitions of all of the columns. So there’s plenty of columns that maybe they’re not clear what they mean. So one thing I’m a big fan of, here’s this drop off column, right? I didn’t know what this meant. So I could go look in the code or ask an expert, Hey, what does this mean?  And then after I do that, I can actually add this right here as a description and save it. Now we also show some recent values for that as well. That’s the thing that just landed. But this way, everybody should be going and adding descriptions to their dataset fields so that it’s clear what it is. I mean, some of them, customer ID, obviously super self-explanatory, but some of these aren’t, and so this is where when you’re filtering for data, it’s very, very valuable to have these right here. 

Kelly Gallamore: 

I can see how that really brings everybody together. 

Irving Popovetsky: 

Yeah. 

Kelly Gallamore: 

Gotcha. Irving, do you have anything else to show us?  

Irving Popovetsky: 

That’s it. Unless you would like to ask some questions. 

Kelly Gallamore: 

Well, we do have one. We got one already, but I have another one. Can you stop sharing and come and have some tea with me?

Irving Popovetsky: 

You bet. 

Kelly Gallamore: 

I have a question from the audience. How would you include security in the conversation while using Honeycomb? 

Irving Popovetsky: 

There are so many different ways to do that. I mean, at the end of the day, security also needs to know what’s going on with your production applications and your production systems. And so if we can provide them with a knowledge source that not only could they see exactly what application teams care about but they could also add their own concerns as trace fields to an application or, you know, other security concerns as well, that’s incredibly valuable. And then suddenly your security team is looking at the same tools as operations and app dev, and that’s a beautiful place to be. Also, security folks will point out things to you that you might not think about as you’re like, I’m going to go instrument every single field. Security folks are amazing because they will go, You know, this field right here may actually be a violation of the privacy laws that we have in our country or state or anything, and we should actually mask this field or do something with it such that it’s not showing up quite this way. So I always appreciate that kind of feedback. 

Kelly Gallamore:

Thank you for talking about that. I also really appreciate the idea that building and maintaining software is a team sport, and the idea behind how Honeycomb can work with the systems that you have is that it helps everyone work better together. Okay. That’s actually all of the questions that I see at this time. Irving, thank you so much. It’s been so nice having you here. 

Irving Popovetsky: 

My pleasure. Thanks for having me. 

Kelly Gallamore: 

For the folks at home, if you’re joining us today live or on-demand for this broadcast, you should get an email at the end that has a few attachments to kind of dig into the Service Level Objective webcast series, if you want more of a deep dive on that. If you’re still new to Honeycomb and you haven’t signed up yet, you can play with our public dataset. There’s a link in the email as well. Actually, you know what? I’m not going to show the slide because it doesn’t really work anyway. For those of you who have not got started yet, we have a free tier, so there’s just no excuse not to come in and see how it could work for you, and down that process, if you have any questions, you can figure it out. No matter how many training videos you have, no matter how many volumes of docs that you have, no matter how much information that we can create to help make things for our customers, any Band-Aid, any bridge that we can build to help make it easier, talking about what fear people go through, that’s what this was all about today. Irving, I really appreciate you highlighting what you’ve learned for our team and all of our customers. 

Irving Popovetsky: 

My pleasure.

Kelly Gallamore: 

All right, everybody. Thanks for coming. Thanks for asking questions.  We’ll see you next time. Have a great week.

Irving Popovetsky: 

Bye.

Kelly Gallamore: 

Bye.

If you see any typos in this text or have any questions, reach out to marketing@honeycomb.io.