Conference Talk

Atlassian Panel Discussion: SRE in the Spotlight

April 29, 2020

 

Transcript

Adrian Ludwig [Chief Information Security Officer|Atlassian]:

All right. It looks like we have gone green. So that means we are in broadcast mode. Welcome to everyone who has joined us today. We’re going to have a little bit of a conversation about the world as we’ve seen it over the last month or two. My name is Adrian Ludwig. I am the Chief Information Security Officer at Atlassian, and I’ll be moderating the conversation, with a great set of panelists. We have Holly Allen, who’s Director of Engineering at Slack. Liz Fong-Jones, a Developer Advocate at Honeycomb. And Patrick Hill, an SRE lead here at Atlassian. We’re going to be talking about site reliability and how teams are doing operating services at this point in a global pandemic that we’re all fighting through, have become critical to the way that not only we’re doing business, but companies around the world and people around the world are doing business.

We’ve seen the world, at least the outside part of the world pause. But online we’ve seen an incredible increase in activity. Cloud services have become a lifeline for communication, for schools, for families, for businesses, and with that have come increased expectations around reliability. So we thought it’d be helpful to bring together some of the experts running some of the world’s most important services, to help talk through some of the things that they’ve been experiencing, and also have an opportunity to answer questions that you might have in the world that you’ve been experiencing as well. Beyond the format, we’ll do a little bit of an intro and then talk through a handful of questions that the panelists have discussed before and thought that would be interesting and relevant to all of you. And from there we will go into open questions and answers.

We are using Zoom just like hundreds of millions of other folks around the world. You’ve probably gotten familiar with it by now, but if not, down at the bottom of the screen, you should see a little Q&A, and you can go ahead and type in your question. I can see those questions, the panelists can see those questions. We may just weave them into the conversation or towards the end, we’ll certainly make sure that we get in there and directly address the questions that you ask. So feel free to ask questions at any time. So with no further ado, I will ask Holly to give us a minute or two about your background and some of the experiences that you’ve been seeing over the last, let’s say last month or so.

Holly Allen [Engineer Director-Reliability|Slack]:

I’m the head of reliability at Slack, and the last month has been pretty interesting of course. We were not a remote-first office at all. We were definitely the company where everyone comes into the office and works together. And so the shift to everyone working from home was pretty dramatic. And at the same time, we saw about a 30 or 40% increase in our traffic. The good news is, is that the technology for the most part scaled cleanly, there were some fire drills of course, but it’s really been the people aspect, that’s taken a lot more care and feeding and evolution to make sure that everyone is able to be productive at home in a way that works for them.

Adrian Ludwig:

Cool. We’ve definitely seen that and are big users of Slack. We’ve seen it scale perfectly. I haven’t seen any issues. I know that what I see is not the reality in some cases. Liz Fong-Jones.

Liz Fong-Jones [Developer Advocate|Honeycomb]:

You all at Slack didn’t just scale up. You also scaled up and launched to be a new Slack, sidebar, which was super exciting, and my team was cheering when it happened. We’ve seen at Honeycomb, so at Honeycomb, we support a large number of unicorn companies, public companies, and of course, Fortune 50 companies. So we’re very, very critical in terms of helping people maintain that quality of service, both for their own employees, as well as for their customers. But interestingly enough, our traffic in terms of incoming telemetry has been more or less flat, because of the fact that people dynamically sample, people take other measures in order to make sure that the cost of their telemetry doesn’t go up too much with the cost of increased traffic.

Adrian Ludwig:

That’s an interesting shift that people brought into, that they were thinking about. That’s great to hear.

Liz Fong-Jones:

In terms of the remote work thing. Honeycomb was already 50% remote employees. So the 50% of our San Francisco based employees started working from home. It’s obviously a thing where they’ve gotten better and better at it, but also we were doing dry runs for this scenario. We actually were prepared to lose our office back in August or September and not be able to get a new lease immediately, before we raised our most recent funding rounds. So we had already been doing dry runs of, do a week work from home for like, out of every eight. So this was kind of, okay, let’s operationalize this.

Adrian Ludwig:

Interesting. That’s the first time I’ve heard someone say they’d started thinking through that aggressively. So that’s interesting, we’ll get into that more, I bet. Patrick, do you want to give us a little bit of your background and how things have been going here in Atlassian?

Patrick Hill [Head of Reliability Process|Atlassian]:

Thanks, Adrian. I’ve been managing incidents at Atlassian for what feels like ages. It’s been seven years. Over the past couple of months, starting in March, we saw a triple-digit increase in the amount of free signups we have, we now offer free products to our customers in the cloud. So go check that out. We have also been scaling out and trying to help organizations and companies and teams that have been impacted by COVID and the pandemic. So if you do need help, we do have some help online available for that. Be it remote work practices, or even help with, you mentioned you used Ajira or Confluence or any of our other products.

On the actual, I guess technical front, we have seen several services had to scale up quite dramatically. We brought aboard some work to help with getting new scaling growth for our customers. We’ve been hard at work, instrumenting them and making sure the user still gets a great experience. For us, I think it’s still in a pretty good state right now. I think we’re still progressing as normal. We haven’t seen too many rocky moments along the way, but what we’re really focusing on is the people aspect of it. I just want to call that out. To everyone who’s listening in, make sure that your teams are healthy, make sure your teams are safe, and make sure your teams are running optimally. Start by maybe asking them how they are in the morning. That may be changing work patterns, but I know Atlassian has always been strong on that.

6:36

Adrian Ludwig:

It’s interesting that there are three very different experiences, right? One, quite a bit of growth, one fairly stable, but all three of you immediately latched onto the people side, which I think is a bit of a surprise. Are there specific elements that you would say are unexpected? I’m going to start with Liz here. Because it sounds like they were already planning ahead. And so I’m curious if there are things that were still surprising, even though you’d been thinking about this in terms of the impact it’s had on your teams.

Liz Fong-Jones:

I think definitely. We had just invested in a bunch of video conferencing harbor for our San Francisco office. And suddenly realizing that we were in the scramble along with every single other company to get people nice microphones, to get people nice webcams that we could actually hear each other on calls, that I think was definitely a surprising thing. Because there’s a huge difference between what you can tolerate, it’s one or two people who are on a not so great connection or not so great microphone versus when everyone is spending all of their days in meetings with people who all have setups that need tweaking.

Adrian Ludwig:

That’s interesting. I know that at Atlassian we set up a stipend basically for people who are going to be working from home. I don’t know that we’d expressly said you should spend it on a high-quality microphone. So if my microphone sucks, I apologize.

Liz Fong-Jones:

There’s a GitLab published guide to “here are some things you may find helpful for remote work.” They’re an all-remote company, and they explicitly say, you should pick at least five things out of this list. Talk to your manager if you’re not sure, right? It implicitly gives you that permission.

Patrick Hill:

It was funny when Adrian mentioned that, because there was a bit of a mad scramble, internally for everyone to get working from home set up. So in Slack and in Confluence, we saw people split up pages. One of them started off with a nice little page on what chair you should get for your body type or, what you might find interesting. Where’s a good deal? Where’s the sold stuff even. And then that started expanding into other accessories. So whenever certain suppliers like Dell, for example, were able to get monitors and stuff, they were able to get their work from home set up done. And I think you saw a really good team effort and team collaboration from the get-go, which is always something that I find really reassuring personally.

I know that everyone immediately went “oh crap, I have to work from home now.” That “what is that going to mean” moment. And then for the whole, for our Otel community to come together and say, it’s okay, here’s what you should be looking at, has been a really good thing for us.

Adrian Ludwig:

Holly, you’re the biggest shift. You’re jumping in. Sorry, go ahead.

Holly Allen:

We have a fun channel as well. We have a couple for work from home equipment, but we have this really fun channel that I love, which is people’s work from home setups and you get to see how they evolve as they get more stuff. You get to have some inspiration from folks and it also just brought us a little bit back together. Right. Just being able to see where everyone’s working, helped you feel closer to everybody. I think back to what was a surprise, one of the surprises for me as a manager was that the number of people who were less productive at home for a variety of reasons are almost exactly balanced out by the number of people who are more productive.

We’ve of course been doing these internal surveys and people, it’s almost a perfect curve. Most people are more in the, I’m almost as productive, right? 80, 90% is productive, but there’s a little bit going on there. But I was really surprised at how that all balanced out, and we just went through a quarterly planning process and we could basically get almost as much done as before if you look at it in a large enough sample.

Adrian Ludwig:

I’m curious there, you feel you could get as much done. Are there changes that you’ve been thinking about introducing to your planning process to help with that, from a reliability standpoint?

Holly Allen:

Oh yeah, for sure. I mean the number one thing for any group trying to get work done, reliability or not, is just, okay, but we should really just focus, right? Any moment that you can take to focus your efforts is good. And so we definitely did that this time and said, okay, let’s make sure that we’re putting more wood behind fewer arrows as they say. And then the other thing is to be even more crisp and clear about what your priorities within those commitments are so that you know… I have no idea how the next three months are going to unfold, given the last month. And so how do we make sure that when those shocks happen, we can absorb that and still get the most important work done and have cleared with ourselves what that work looks like beforehand.

11:17

Liz Fong-Jones:

I think another interesting element here is thinking about the team as a unit of planning, right? Not the individual. So having larger teams. Maybe instead of having one person on call, you put two people on call, so they can cover for each other when they’re parenting. Right. Make sure that we have some slack in the systems, tense or loose sense, not the product, in terms of making sure that we have the ability to say no to work that is just going to fill the time and not necessarily actually get done.

Patrick Hill:

Definitely. One thing we’ve called out, also, I guess is that biosystem internally. One thing we have also suggested people do is to report in on the temperature, right? I don’t want to be, my uptake is saying, SLOs for your team, but in some cases you do need people to wake up and say, hey, sorry, I’m not operating at normal capacity today. Can you kick the can down the road in a couple of meetings? Can we delay stuff? Can we be a bit more, I guess, mindful and caring about how we’re running our teams? One other thing-

Liz Fong-Jones:

A technical term, right? It’s load shedding. Load shedding, except instead for computer systems, for people.

Patrick Hill:

Definitely. Definitely. The other thing I want to call out and I’m not sure how many of that you do, So I thought people just out in the wild, is always calling out the keeping the lights on, is always number one priority and then everything else comes second. I have had personal experiences with that, and no, deliberate should be seeing today. And then someone going, but your service is down. So what are you doing?

Holly Allen:

Something that’s been really helpful to me as a leader, is having our executives always saying that actually the number one is your health and safety and that of your family. Then number two is keeping the lights on and keeping everything reliable and then everything else comes after that. And of course, I believe in that, but it just makes it easier when everyone at the company is really echoing that. And making that safe space right? For, hey, I just need to take the day off. I don’t need a reason. I can’t do it today. Just making that be safe.

Liz Fong-Jones:

And that’s not just for parents. But also for people’s health conditions, people’s mental health issues, or just in general, right? We’re all working from home during a pandemic.

Adrian Ludwig:

It’s been amazing to find out. I enjoy getting a glimpse of people’s homes, a glimpse into their life. And I’ve had several conversations with folks that I meet with them once a month and I talk about finance or something specific like that. But when you start the conversation with, hey, how are you doing today? And then 15 minutes into that conversation, you realize you’ve learned more about this person than you had in the last three years of working with them. There’s a lot of shared humanity and compassion that’s going around, which has been great.

Are there particular techniques that you found that have been helpful for encouraging, that sense of compassion? Patrick mentioned doing a basic check-in. Liz was talking about load shedding and making sure you’ve got some redundancy in the system. Are there things that teams should be thinking about from a people standpoint, that you’ve found that are effective?

Patrick Hill:

One thing that I think the Atlassian team has done for a long while, is to have a lot of social time in the calendars. And this means talking to people in other teams you don’t interact with normally, going from a town hall-style meeting and then breaking up into random groups sometimes. Putting in time for coffee in the day, right? I think it’s very easy for some of us to be able to get up and just go get a coffee, and then sit back down. But I think it’s much more engaging to then sit there and have that coffee with someone else. Right. And chat about what’s going on. One of my team members is actually looking at buying a new house and we were talking about a little bit of Austin, and he’s talking about, hey, which part of Austin should we be looking at? Where should we be thinking about? My response was always, hey, maybe not now, but it’s an important thing to understand where we are, outside of work and outside of their day-to-day responsibilities.

Adrian Ludwig:

Holly, I think you were going to jump in there as well.

Holly Allen:

I was going to say the exact same thing. We have team level afternoon coffee times. I do group level happy hours from four to six, every couple of weeks, recreate those hallway moments for us to connect as human beings, really helps not only just make a space where people can actually say what’s on their minds, but also it helps us help each other when those moments come and understand say when somebody takes a day off and not have it affect team morale.

Liz Fong-Jones:

I think there’s this really interesting Slack bot that’s called Donut that you can pay a certain amount per member, per month, and it lets you basically prep these big breakout sessions, where you can assign three people all cluster together from your company, selected at random, reflecting from a channel to all meet with each other and get coffee.

Adrian Ludwig:

That explains why I keep ending up in these chats. We have Donut, we have been using it and I do keep getting dropped in. We have a once a month lunch thing, where we just get randomly selected. I didn’t realize that was a service that we were paying for. All right, there we go. I’m super curious if there are things that you’ve tried that have flamed out completely. There’s no guarantee that you need to be able to respond and if there hasn’t that’s okay. There’s one thing that-

Holly Allen:

I could tell you.

Adrian Ludwig:

Okay, go for it.

16:41

Holly Allen:

The first big push was, let’s have fewer meetings. Great. But one of the things that we tried that totally failed, was moving some meetings to completely asynchronous, and it was just no good. I’m not going to say everything has to be a meeting, but some of these bigger meetings just could not move to an asynchronous only format. And we ended up dropping that after about a week. And then in my own staff meeting, I moved half the agenda to the end channel and then half to still be in the meeting, and took the meeting to half its size. But I found that after about a month, I still wanted to bring those topics back into the room just to resettle synchronously, and then I’ll pop them back out. So that’s one failed experiment and one-half success.

Liz Fong-Jones:

There’s definitely an art to knowing, right? When is it time to take the comments off the PR and into having people talk face to face virtually?

Adrian Ludwig:

Nice.

Patrick Hill:

The other thing I would say is, I work in a remote team, where we work around the world, we are always, usually dialing in from home at some point. Just because you used to have an asynchronous standup at the end of your sunlight hours, doesn’t mean that that’s going to be work moving forward. Holly has mentioned async, Liz’s mentioned, how did that escalate things to add asynchronous and asynchronous formats. I’d be mindful of that. I know in my team, we’ve had people who have actually wanted to have a proper conversation to start the day during stand up, which is fine. Just not the ritual time, but potentially. So how can we then include that in our practices moving forward?

Liz Fong-Jones:

For us, it was interesting in that we moved away from a video standup because our teams were getting too big. We are actually growing slightly during a pandemic, which is interesting. This means that we have new hires starting and suddenly our standup was I think, 12 or 14 people. And it was like, you know what, this isn’t working anymore. What we’re seeing is it’s not relevant to each other. Let’s make an async or let’s break into smaller groups.

Adrian Ludwig:

We’ve had a bunch of new people start as well. I was talking with one of them this morning, that has never seen the office, has never actually met any of the Atlassian’s in person. It’s a very different experience from when I joined, that’s for sure. It’ll be interesting to see how that tracks over time and what the effects are. That might be a good webinar discussion to bring in a bunch of people who have gone through that experience and see how that shapes out.

Liz Fong-Jones:

And also relevant to our SRE theme. We actually had someone do his first on-call shifts during this transition. And it was really exciting to see how we help each other through it.

Adrian Ludwig:

Nice. That’s good stuff. Changing topics a little bit. Holly mentioned that Slack has seen an increase. I’m wondering if there are other different changes in demand that you’ve been seeing. I was reading an article earlier today about electricity consumption in New York. There’s amazing data on that. Overall it’s down, but it’s shifted dramatically from enterprise/company paid bills to being consumer paid bills. So there’s also a financial impact in terms of the way that shifted around, and different times of the day and all kinds of complexity in the way that that system is being forced to adapt. I’m wondering what kinds of things you’ve been seeing in your services as well.

Holly Allen:

We had a number of very large global hackathons join Slack really rapidly. And I think that’s just a very fast cycle. The scale of those teams coming in might normally come through a sales engagement and you’d see them coming from a mile away, but instead, it’s last minute and you have to make sure that that group is going to be successful. I don’t think that we’ve seen-

Adrian Ludwig:

People related to COVID-19?

Holly Allen:

Exactly. COVID-19 hackathons that are getting split up obviously very rapidly. But in terms of self-service teams getting started and our enterprise business, those I think are about the same.

21:02

Liz Fong-Jones:

We’ve also, in the DevRel business, we’ve seen a lot of conferences move to online and trying to reproduce the hallway track by having Slack workspaces for it. So I suddenly find myself in five times as many Slack workspaces, each of which has several hundred or several thousand or even tens of thousands of members. It’s really fascinating.

Patrick Hill:

We have seen a couple of our customers come to us, and go, actually we want 4X or 5X outset license usage for certain things. I think that we’ve had too many go, Oh, okay. To hold these points. Usually, get a bit more time on that. The response from them sometimes it’s been actually, we want it next week. Can you get the ball rolling on that? I think to what Liz just said there as well. Someone brought this up to me the other day, there’s an animal pressing DevOps conference coming up. Which-

Liz Fong-Jones:

Oh yes. Desert Island DevOps.

Patrick Hill:

Yes. I couldn’t remember the name. Sorry, but that just sounds like an amazing experience, to do something a bit different.

Adrian Ludwig:

Oh, man. I think we should definitely discuss the video games at some point here. That’s a critical part of the way that the people are getting stuff done, for sure.

Liz Fong-Jones:

Exactly. Right. And we’re seeing a lot of interest from video game companies. We’re seeing a lot of interest from healthcare companies, obviously, as well as communication service companies. Lots of industries are going up at the same time. Right. And it’s very much making up for the declining business from other sources.

Adrian Ludwig:

We saw one of the companies that is involved in home delivery increase by, I think, an order of magnitude in the span of two weeks. There were public reports on Amazon adding 100,000 people. So we’re definitely seeing shifts in both directions. That’s for sure. Are those things that your teams have had to make special accommodations for or are those the types of things that you’re ready to go? And it was just exciting that things are moving more quickly.

Patrick Hill:

We haven’t noticed any major faults in scaling up, but we have brought forward some pieces of work. So there is a capacity project that basically, we took the first steps during a race in a hackathon, and then that revealed most of the value, but now the team is working on how can we make this a reality? I don’t want to encourage people to ship large pieces of work in 24 hours though. It’s good to quickly antipattern what you should be doing now. What I would encourage teams though, is to think about. Cool. What’s changed? What’s the new reality? Right? I think for quite a few of us, just because your growth code was, probably looked like a hockey stick, a hockey stick maybe has gotten a bit sharper recently. To look at those patents, look at those assumptions you had in place. See if you do need to redesign things, only because we all know the lead time on rebuilding or redesigning systems can be really long, can be really short. It depends on what it is, but you don’t want to find out about it when it’s broken.

Holly Allen:

Like Patrick, we got lucky that one of the systems that really needed to be changed because of the scale, we already had the design in our pocket. It was going to be a quarter-long project and we got it done in just a couple of days. And so that was very fortunate. And luckily also it was a smooth piece of work. But for the most part, the systems just scaled, as they were supposed to automatically, which was also really heartening, to see, Oh, look, it worked.

Liz Fong-Jones:

When you have the right automation in place. Every time you add a customer, that adds 10% to your workload. It’s just adding conversant to your workload. You don’t want to end up in this situation of overnight doubling your workload. That can create a lot of strain, but if you’re used to steady growth and it’s just happening more often, that feels fine.

Adrian Ludwig:

Well, it’s cool to hear that, at least on the cloud services side, the shifts in demand are not being nearly as disruptive as they are in the real world, where supply chains don’t have that flexibility. We’re still looking for toilet paper.

Liz Fong-Jones:

And also Azure ran out of servers briefly. So it’s not that-

Adrian Ludwig:

Interesting.

Liz Fong-Jones:

All of these things are ultimately backed by real computers that someone has to go to a data center to maintain.

Adrian Ludwig:

We haven’t had that problem. I will flag one thing that we have run into, which is, there are certain parts of the world right now where you just can’t get laptops. So we’ve had new employees onboarding, and it’s like, so you’re going to be working from home, and you’re going to be working on whatever computer you can find. And once, couriers become a legal thing, and once the laptop supply chain has sorted itself out, we’ll equip you properly. But in the meantime, 10%, 20%, do what you can.

Liz Fong-Jones:

InfoSec people must be having an interesting day. I bet the folks working on endpoint security are doing a thriving business.

Adrian Ludwig:

I’m not going to talk about that. My team was not super excited to hear that though. You’re right. But that gets at a good point. Are there any long term trends that you see right now that you think are going to dramatically change the way either your team needs to work or the way you’re thinking about infrastructure?

26:12

Liz Fong-Jones:

I think that a lot of this for us validates our decision at Honeycomb given that we were a 50% remote organization, to build in these functionalities into our products to make it, so that you could share your Honeycomb queries in Slack, or to look through someone else’s query history to build on their queries. That ability to virtually look over someone’s shoulder without having to physically sit and look over their shoulder. Atlassian we’d already invested in. And now it’s a thing where it’s like, this is table stakes, right? This is not nice to have, this is a must-have for today’s era.

Holly Allen:

At Slack, obviously we put all of our work and communication in Slack. And so that has continued to work. I would say it’s been pretty seamless. So the jury is still out on whether or not that’s going to change anything after all of this is over, but I think it’s proven just how effective we can be even in this circumstance, which is heartening.

Adrian Ludwig:

It would be nice when we think about an office as a luxury, rather than a necessity. Sorry, apparently something is being printed on my printer, that’s next to me.

Liz Fong-Jones:

Speaking of which, an office is a necessity. People need working spaces. It could be at their home. It could be in a physical office, but it needs to be a place where you can sit and go heads down. Right. That’s what I discovered in my first month of working from home, I was like, Oh goodness, there are now four people trying to all have meetings at the same time, in the same room upstairs. So I was like, okay, I’m getting my own working space. I’m sorry, I love you dearly, but we need more capacity.

Patrick Hill:

On a similar thing. I’ve got friends all around the world, but then a lot of them travel for work basically. So they might live in their apartment, one week, a quarter and now they’re like, cool. I haven’t got any furniture, because I’m not here. And now they’re adjusting to that reality. I have other friends who, simply go, hey, living in New York, maybe not the greatest decision right now, but I think no matter where you’re at, what’s Liz’s pointed there, you need a space where you feel you’ve got enough equipment, you’ve got to have space, where you can think freely and be able to, I think work with your team.

Adrian Ludwig:

Kindergarten is learning how to use rulers today. So worksheet has-

Liz Fong-Jones:

Speaking of glimpses into people’s personal lives.

Adrian Ludwig:

Here you go. Thinking beyond the team dynamics and even the interactions with your apps. Are there things that you’re seeing at the core infrastructure level? Network congestion is something that people have flagged.

Liz Fong-Jones:

There’s some really interesting data out of Catchpoint. Catchpoint is similar to a system for monitoring your CDN. And they have reported something like 50% of reported speeds in certain parts of Chicago. So that’s definitely pretty congested. The other capacity funding thing is a lot of people’s internet connections are set up for one video link, not multiple video links. My household is very nice in that we have a 50 megabit up connection. Other people do not have that luxury. Right. And they’re having to contend on very poor bandwidth.

Patrick Hill:

Liz, I’m on my phone connection right now. I know people who have battled with that and had switched between their, I’m not going to name the brand, but their cable internet, tether their cell phone and then go back and forth depending on what the weather is outside, at certain points during the day as well. The one thing that I have found interesting with it, is people have become very sensitive to services going down, particularly things like Slack, right? As soon as there is a network interruption, Slack is not working, and then you get someone calling you on your phone, saying, hey, I couldn’t send you a message, or I couldn’t call you up or something like that.

And you are going, no, it’s your internet. It’s okay. It’s going to be fine. Everything’s working normally. So I think people have been really sensitive to network interruptions. I know there was a provider that had two fiber lines cut last week. At the same time that caused us internally to go, hey, is that VPN down? No, it’s just routing. We’ll figure out a way around it.

Liz Fong-Jones:

Definitely, there is this idea of service level objectives, right? Where we say, we’re aiming for three nines or four nines. And it turns out that a lot of that error budget winds up being spent on things outside of your control. Correct.

Patrick Hill:

Correct. Correct. The other thing is, I was glad it was a few mixed expectations, right? A lot of us for the longest time, as Liz has mentioned, expect the network was just to be there and just be available all the time. In a home environment, we haven’t planned for that red thing. I’m not sure Liz, if you have, but I know for me, I just got the biggest connection I could, that doesn’t actually mean that it’s what I need, but I haven’t actually done capacity planning on my home network. Some of our SREs have, and they’re telling me, hey, I’ve had to start limiting certain things in some of the networks. I would encourage you all to look at that maybe if you are living with contention. There are some great, I think one of the big ones we recommend whenever you’ve got the new ubiquity set up. Go check that out. But again, having capacity planning and things like that inside your own home network has been a really interesting experience for some people.

31:34

Holly Allen:

I had my internet down for four days last week. And it was incredible when I was on the phone with the provider and they said it was going to be four or five days before the tech could come out. I’m like, that’s not possible. That’s telling me I can’t have water or that my air is going to be shut off. But I was extremely pleased by how good my cell phone tethering was. I was actually able to get almost everything done except there was no video happening.

Adrian Ludwig:

I remember the debate in my household, a year and a half ago, about whether to spend the extra 50 bucks a month or whatever it is to have business class, as opposed to. I don’t know if it’s good or not, but it’s working, and it turns out we do have five video streams now at a time. Because kindergarten apparently has to have Zoom. I’m noticing now that we’ve got one question that came in, for folks that are listening in, absolutely would love to hear more of your experiences or any questions that you have. So as a reminder, there is the little Q&A pot down at the bottom, and we’ll be getting into those questions here shortly. One of the things that I know folks who were interested in talking about, was some of the experiences they had, helping out. I don’t know if there are any of those in particular that you could share or things that you can-

Liz Fong-Jones:

I’m reading the question. And I would say the number one reason having like, hi, I am on litigation hold due to multiple lawsuits involving Google. You do not want people doing eDiscovery on WhatsApp groups. You’ll inevitably get sued. You will inevitably have to do an eDiscovery. Please centralize your communications on something your organization controls. Otherwise, your employees literally are turning over their cell phones, including all of their WhatsApp and not just their work WhatsApp. And it will be dead. That is what I would recommend to answer that question. I’m not showing for Slack. We are customers, but that is my painful hard-won experience.

Patrick Hill:

Definitely don’t use consumer stuff, enterprise the business-related activities. There’s probably one thing I actually do want to call out, although we are in our homes, and I think Adrian mentioned this before, when using your own devices and stuff, please, don’t use it for personal, please use what your company provides you, as much as possible. I just wanted to call that out.

Holly Allen:

Absolutely. No matter what communication platforms you choose, make sure that they’re enterprise and built for business and not built for consumers, and get your corporate lawyers involved. That’s what I would say.

Adrian Ludwig:

It’s so nice that someone not on the security team makes those arguments. It’s great.

Patrick Hill:

Back to Adrian’s question around helping out. We had mentioned it before, but it’s not just I think people need to take time off on call, it’s also, saying, hey, I need to go complete this bike chore or activity, right. To go help someone else out. Just yesterday, one of my aunts and uncles went, hey, my car has probably been fixed. Now I can do my shopping. Can you please swap with me for an hour or two, so I can go and get supplies? And everyone’s like, yes, yes, of course. You don’t need to ask twice.

Liz Fong-Jones:

There are all kinds of funny stories. One of my coworkers literally drove to a flour mill to distribute flour in industrial quantities to his neighbors. And that was what he needed on-call cover for two hours for.

Holly Allen:

I’ve got two stories. I took half a day the other day to do volunteering at a food bank. And that was great. And I know other people who are able to start doing things like that. And then Slack itself did something that I thought was really great. Doctors and medical professionals were having trouble in other spaces, knowing that they were talking to other doctors about COVID related things. And so we partnered with ID.me, which does identity verification, to create a Slack workspace where only medical professionals can join it. So that they know, okay, I’m only talking to other medical professionals about the problems at hospitals or the problems in research or whatever the case may be.

Adrian Ludwig:

That’s an interesting challenge actually. Because we need to have effective global sharing of information, but the reliability of that information doesn’t necessarily seem to be going in the right direction at a global level. So that’s great to hear. That’s a really cool suggestion. One of the things that I’ve seen that Atlassian has done, that I thought was really interesting. Our workplace experience team, which normally would be making sure that our offices are working, people are having communication, they’re keeping busy. One of the things that they’ve done, they went out and they bought gift certificates and all kinds of different things from the various different office areas. So each of the different office locations.

And then they’re doing a charity auction inside the company. So they’ve already from the get-go, made a contribution locally in each of the different areas. And then in addition to that, the proceeds from that auction are going to go to other nonprofits that need help. So I just thought that was a really elegant approach, for them to be able to help out. Two different directions at the same time. And the employees were excited to see it. It’s like, I forgot about that place down the street that I can order cheese from, apparently to deliver to my house.

Liz Fong-Jones:

We did a similar thing except we are getting, I think a $50 per week stipend for ordering out so that people can support their local businesses.

Adrian Ludwig:

That’s cool. That’s cool.

37:26

Patrick Hill:

To that point, we have had people in SRE, I know, take their lunch hour together and sorts, if they are in the same time zone. And then all get their breaks or something like that, and all expense it together. And that recreates the team lunch environment. Now, not everyone can do that, but something to think about, again, to support local businesses that aren’t running right now, where possible.

Adrian Ludwig:

I’m not seeing a ton of additional questions here. Are there additional things that the folks on the panel wanted to make sure that we got some insight into, or that you wanted to share with listeners when you have a chance?

Liz Fong-Jones:

One of the things I wanted to share that I’ve been working on recently is, I’ve been working on opening up office hours to replace the conference hallway conversations I’d otherwise be having. So it’s been a really exciting thing to be like, hey, if you’re working in the medical industry, if you’re working in the banking industry, or if you’re working in whatever industry, please come on by, anyone can book 30 minutes on my calendar, those kinds of spontaneous connections I found to be super, super interesting and super, super cool. I’d really encourage people to think about how I can share my expertise outside of my company, as well as within my company.

Adrian Ludwig:

I’m doing something similar. The security team at Atlassian is about 85 people at this point, globally distributed. We’ve got people in Austin, Texas, Sydney, Australia, and then strewed about everywhere in between. It used to be that I started my day with an hour of quiet time to myself to get through email and then conclude the end of the day, the same way. What I’ve done is, I’ve declared that to be office hours. I’ll just sit here, I’ll open up a Zoom, just throw it over to the side. Just sucking up bandwidth, because that’s what people like me do. And then it’s on my calendar and anybody can just drop in. And pretty much one or two people a day, drop-in, they’re working on a technical thing, they’re working on a personal thing. They just want to say hi, whatever. And it works out pretty well. It’s a neat approach to get some of those discussions going as well.

We’re starting to get some questions. Any other suggestions that people have seen or experiments they’ve tried around flexing of ours? I know I now have a very different schedule than I used to because I have to help set up Zoom meetings for children in the morning. Other things that people have been seeing?

Holly Allen:

We’re heavily supporting flexible hours and flexible schedules. And of course, I’m flexing my hours as well. The thing that has to go along with that, is the assurance to folks that this isn’t in any way, going to negatively reflect on people’s performance. Right? It’s still all about, we’re a team. To Liz’s point, the unit of planning, the unit of delivery is the team, and that we’re all in this together. And being flexible with each other just helps the team do that.

Patrick Hill:

I’ve had a couple of people in our team at Atlassian, either change their hours, shift their hours, or block the middle of the day because I work in Austin, and then I have people that I have to work with in the Bay area and then Sydney and in India as well. I tend to take off lots of time in the middle of my day. I do want to encourage anyone who needs to do the same. Don’t feel like you are chained up to your desk for 14 hours. Block the time and try to reduce down your direct working hours and then have some time to go do other things. You can even consider that. For those that have been in a relationship before, who need the time to take care of the people that live with, their family, whatever, take that time, right? I would seriously challenge anyone that says, no, you have to stay at your desk for your whole shift or anything like that.

Adrian Ludwig:

I was on mute. How many times have you not heard that in the last month? There’s another good question here. Which actually I’m super intrigued by, which is around balancing future development and reliability. I’m curious whether… the person asking the question here, I’ll extend the question a little bit. Have you seen changes in the things that are coming out of dev teams in terms of the impact on reliability? Have you proactively or reactively made changes in terms of ship schedules or changes to quality or anything along those lines?

42:10

Liz Fong-Jones:

You should be shipping more often, not less often, in this time. It was amazing to see people pull together and shift things. I think we heard both Atlassian and Slack, where they’re pulling together and shipping things in a 24 hour, 48-hour window because they were suddenly really important. So if you don’t have that hygiene and infrastructure set up to be able to ship quickly and respond to needs, now is a good time to invest in it, I would say.

Holly Allen:

Yeah, absolutely. Go ahead.

Patrick Hill:

To Liz’s point, it’s really a trust mechanism and the tools you have in place. Previously I know, in offices we would probably sit there and just lean over and say, hey, can you check out this PR for me? And someone would go, I haven’t had my coffee yet, so I’m probably not paying attention, tick, move on. Whereas now, we don’t have that direct social interaction. So you really have to trust what tools, mechanisms you have in place to tackle those. You won’t have the opportunity to do a face to face review potentially for everything. So just keep that in mind. You really have to lean on the automation that you have in place, and to Liz’s point, if you haven’t got it, then start investing, because there’s definitely not a better time.

Holly Allen:

I agree. It’s a false dichotomy of reliability versus shipping features. One thing that we have in place at Slack is, any project that’s going out that is sufficiently large, which isn’t that big has to go through a final review process. And that’s where we just do a double check on quality, on your monitoring setup, on your teams alerting and paging, things like that. And I run that for the reliability portion. And I will definitely say that I have a little bit of heightened scrutiny now. But that isn’t systematic. It’s just, I’m very well aware that everyone is relying on Slack and we have these big features going out and I want them to hit well.

Liz Fong-Jones:

I think it’s down to this idea of what is your service level objective? Was your previous SLO correct? Or was it too lenient or was it too strict? Right? Now is the time to revisit that and then continue to make data-driven decisions based on that SLO.

Patrick Hill:

We have the same process here, Holly, so let’s compare notes after this. We would have, I think actually more services and more teams coming to SRE and asking for advice, on how to ship something now. I just wonder, is that digitizing previous fiscal interactions or is this something new? Are people sitting at home going, I don’t have the psychological safety of my team, to then say this is bound to ship. Right. And then now they’re actually taking a real honest assessment of where they’re at. I find that really interesting. We’ve definitely seen the same mental changes roll through, compared to normal back in the office days. That sounds like a weird thing to say, doesn’t it? Back in the office days. But that’s because I think we’ve invested quite a bit in making sure that work plans work correctly.

Adrian Ludwig:

I’m just imagining now that people are like, if I break the build, they’re going to come to my house. There’s another great question in terms of the flow of information. And now you’ve got a lot more online meetings, no whiteboards, things like that, are their tooling changes or process changes that you’ve put in place around documenting just how decisions are being made and how that stuff is coming together? Are people actually typing while they’re in Zoom or is that no longer happening?

Holly Allen:

I’m doing that. I’ll usually keep the minutes for the meetings that I’m running. But it is maddening because at least when you were in the office, you could say, please close your laptops. Right? I felt a little bit like a school teacher having to tell people, but I can’t do that now, and I know they’re multitasking. And I would say that the documentation is fine and hasn’t changed. It’s the, how do you keep everyone engaged in the meeting? But I’ll take half of that on myself and say, I have to present an engaging meeting that people want to not multitask during.

Liz Fong-Jones:

I 100% agree. Right. Not everyone is neuro-typical, not everyone has ADHD. Right? I think that we’re definitely seeing an environment where people are going to pay the amount of attention that they’re to going to pay, and we can all help each other out through this.

Patrick Hill:

We’ve had to then, we use Confluence and Trello, Trello for many general items and usually Confluence for collaborative votes. We’ve had to then realize that you won’t be able to get everything done in that meeting. And then get more familiar with doing async stuff. It’s very easy, I think, to sit there and to require people to leave their beds and join the meeting. Sometimes you won’t and sometimes you just have to sit there and go, cool. We’ll get back to that during the rest of the week. Some of that’s been helping, some of that hasn’t, honestly, not everything you can delay, but that’s life.

Adrian Ludwig:

Cool. Well, I think we managed to get through those questions. I want to thank all of you for spending a little bit of time with us here today and giving us a little bit of visibility into the experience that you’ve been having. Undoubtedly, these are memorable times and I think it’s also a watershed moment, where we’re seeing cloud services, online services become critical infrastructure, not to use that phrase too lightly. And the value that you all are providing in terms of both keeping the services working, but also figuring out how to have your teams be effective and be more productive and keep the lights on. I think I speak on behalf of lots and lots of folks, in saying thank you. I really appreciate it and keep up the good work. With that, we’ll call it a wrap and hope you all have a good rest of your day.

If you see any typos in this text or have any questions, reach out to marketing@honeycomb.io.

Transcript