Software Engineering  

Experiments in Daily Work

By Jessica Kerr  |   Last modified on October 2, 2023

TL;DR: Sometimes I get hung up in the scientific definition of "experiment." In daily work, take inspiration from it. Mostly, remember to look at the results.

Experiments in science have a specific definition

In high school chemistry and then college physics labs, we learned a strong definition of "experiment." Experiments are tied to the Scientific Method, responsible for advancement of human knowledge.

A proper scientific experiment has:

  1. a hypothesis of what you expect to happen, which is based on a wider theory of how the world works.
  2. a procedure to execute in a controlled environment. Hold everything except what you’re testing the same.
  3. a measurement of the results, so you can compare them to expectations.

For instance, at a “Meet the Teacher” event the other day, my kid's chemistry teacher set up an experiment for the parents. The procedure is to mix 50 mL of water with 50 mL of isopropyl alcohol to see what the resulting volume is.

The parents' hypothesis was that we'll have 100 mL of liquid, because basic math, 50 + 50 = 100. He proceeded to pour the liquids from clean volumetric cylinders into another. We read the resulting volume, and it was about 97 mL. Expectations were not quite met—was it an error, or did we learn something?

Experiments at work try to follow this archetype

Honeycomb has this value: "Everything is an experiment." This applies to the work we do and the processes we do it with. If a scientific experiment is the gold standard, how is our work like it?

  1. When undertaking a project, we state beforehand what the expected outcome will be. This is based on our expertise.
  2. We do the thing, in the real world. There is no controlled environment, but sometimes we approximate it with A/B tests.
  3. We check on the outcome. We compare this measurement to our expectations, and decide whether the experiment succeeded or failed.

A few projects look a lot like this. The growth team wanted to get more new Honeycomb users to run their own query. They noticed that most new users click on Datasets and get stuck on that settings page. Can we change that?

  1. They hypothesized that if we rearranged the navigation bar, moving Datasets down and renaming it to Data Settings, then more people would click on "Query" first, and run more queries. This was based on our designers' expertise in our product, including interviews with new users and ideas from the whole team.
  2. They implemented this change for half of new free users for a month. The other half of users was the control group: everything else held constant.
  3. Based on data in Amplitude, a user-journey observability tool, fewer users with an updated nav bar visited the settings page early, and more ran queries. Experiment successful!

Ideally, experiments are reversible

After this experiment, the growth team updated the navigation bar for everyone (before that, the experiment was reversible). We aren’t going to change it again any time soon, because swapping UI elements all the time is rude.

In science, experiments don’t change the world, only study it. In the human world, everything we do is a part of history, and therefore a part of the future.

Some changes are internal only, private to the team. If we decide to move our team meeting to another time, nobody outside needs to know. At Honeycomb, process changes like this fall under “everything is an experiment.” A change like that is reversible if it doesn’t work out for everyone—just don’t forget to ask in a few weeks whether it is working for everyone.

Most of our work doesn't fit so closely

Most changes to our product don’t fit this archetype. We make changes that aren’t reversible (especially if they affect data). For instance, Honeycomb recently added a feature called “is_root” to make it clearer how to count only the top event representing each request, instead of every little event (the alternative is “trace.parent_id does not exist”—not obvious). Once people get used to it, taking it away will be confusing.

  1. The product team suggested that this would help people write queries, based on their expertise in the product.
  2. The devs implemented it.
  3. Do we know whether it’s working?

Most development teams I’ve worked in stopped after step 2. Honeycomb’s “everything is an experiment” shows up in step 3: we instrument the code with each change, so that we can go back and check whether people are using the feature.

Our work can’t all be reversible, because we aren’t working in a controlled environment. We change the world as we go—that is the whole point of the work. But we can look at the result.

Seeing the result is often the hardest part

Well, we can look at the result if we have a way to measure it. Our development teams can, because they use Honeycomb and add instrumentation to each feature.

Mr. Lockus used volumetric cylinders to hold the water and alcohol. If he’d used plain water glasses, or a 2 cup Pyrex, we never would have noticed the 3% volume discrepancy.

Setting up systems for measurement lets us bring our work closer to experimentation: we can learn from it. 

Some work is really hard to measure. Right now in marketing, getting data and measurements in place is as much of the work as actually doing stuff.

OKRs are sorta like experiments

Still, our OKRs (Objective & Key Results) try to follow this experiment model.

  1. Specify the outcomes we really want. For example, “Build community around state-of-the-art observability.” This is our Objective, and the hypothesis is that we can do something toward it this quarter.
  2. Some of our KRs count activity, like “send three newsletters.” These aren’t technically results, but sometimes you settle for what you can see.
  3. Other KRs are a proxy for outcomes, like “Get 100 subscribers.” Is a newsletter subscriber community? 🤷 It’s an opportunity for more community, at least.

Through the quarter we check in: have we moved this forward? If not, what will we change? Sometimes what we change is the OKR, if we know better now what is most important.

There are two key parts of our process that resemble experiments:

  1. Think about what we want. Have expectations for our outcomes.
  2. Do the stuff—this is not special.
  3. Look at what happened.

Fortunately, Honeycomb doesn’t do stupid things with OKRs like tie performance evaluations to whether the numbers were met. Instead, we get to learn from what happened.

When we look at the results of what we did, we don’t look only at the numbers we laid out. We look at everything else. Did someone post about our newsletter on social media? Did we learn a lot from bringing it together? Was it more work than we thought? The impact the work had on our team is an important result too.

Similarly, development teams don’t look only at “How many people typed ‘is_root’?” They notice anomalies like a change in query run statistics. They look at performance: did this feature impact our important flows?

Don’t just look for the expected results. Look for surprises.

When scientists run an experiment and find something completely unexpected, that’s when we get world-changing discoveries and paradigm shifts. Personally, I’ll settle for incremental insights into how Kubernetes really works.

Debugging is a lot of tiny experiments

One place the scientific method has helped me, for my whole career, is in debugging. When I make any code change:

  1. Say what I expect to happen now: “This is gonna fail the same way, but print a stack trace this time.” This is based on my mental model of how the runtime works.
  2. Run the code.
  3. Check the result. Look for any surprises. Read the whole error message.

Work is not science, but it can take some cues from it

There’s an element of accountability in treating every code change, process change, and debugging step as a sort of experiment. The accountability is: know why you’re doing it, and look at the results.

Caveat: sometimes, why I’m doing it is to see if I can, and sometimes, look at the results means did anybody laugh? Like making heatmaps out of pictures. Don’t forget that extra change-in-the-world result of I know more about how Honeycomb constructs heatmaps.

Here is how the useful parts of the scientific method translate to real life:

  1. Have expectations. Have a theory of how this stuff works.
  2. Do stuff you think will work.
  3. Look at the results. Look for surprises. Update your theories, increase your expertise.

Mr. Lockus, the chemistry teacher, explained to us that the water molecules and the alcohol molecules kinda fit together, like an open box fitting partly inside a slightly bigger open box. This lets their combination compact into 97 mL of space instead of the full 100 mL they took up separately. The parents learned something mostly useless about molecules, and some very useful things about the scientific method.

When everything is an experiment, then I get to stop and reflect each day, and find out how the work has changed my knowledge and the world.

 

Related Posts

Software Engineering   Monitoring  

What Is Application Performance Monitoring?

Application performance monitoring, also known as APM, represents the difference between code and running software. You need the measurements in order to manage performance....

Software Engineering   Observability  

Where Does Honeycomb Fit in the Software Development Lifecycle?

The software development lifecycle (SDLC) is always drawn as a circle. In many places I’ve worked, there’s no discernable connection between “5. Operate” and “1....

Teams & Collaboration   Software Engineering  

Product Managing to Prevent Burnout

I’ve been thinking about a risk that—if I'm not careful—could severely hinder my team's ability to ship on time, celebrate success, and continue work after...