“Why Are My Tests So Slow?” A List of Likely Suspects, Anti-Patterns, and Unresolved Personal Trauma

Why Are My Tests So Slow - Featured Img

6 Min. Read

“Lead time to deploy” means the interval from when the code gets written to when it’s been deployed to production. It has also been described as “how long it takes you to run CI/CD.”

How important is it? Fucking critical.

It’s nigh-on impossible to have a high-performing team if you have a long lead time, and shortening your lead time makes your team perform better, both directly and indirectly. That’s why I think your lead time should clock in at under fifteen minutes, all the way from “merged” to “deployed.”

Some people are already nodding and agreeing vigorously at this, but others will freak out. “FIFTEEN MINUTES?” they’ll squall, and accuse me of making things up or only working for small companies. Nope, and nnnnope. There are no magic tricks here: just high standards, good engineering, and the commitment to maintain your goals quarter by quarter.

If you get CI/CD right, a lot of other critical functions, behaviors, and intuitions align to be comfortably successful and correct with minimal effort. If you get it wrong, you will spend countless cycles chasing pathologies. It’s like choosing to eat your vegetables every day vs choosing a diet of cake and soda for fifty years, then playing whack-a-mole with all the symptoms manifesting on your poor, moldering body.

Is this ideal achievable for every team, on every stack, product, customer, and regulatory environment in the world? No. I’m not being stupid or willfully blind. But I suggest pouring your time and creative energy into figuring out how closely you can approximate the ideal given what you have, instead of compiling all the reasons why you can’t achieve it.

Most of the people who tell me they can’t do this are quite wrong, as it turns out. And even if you can’t go down to 15 minutes, any reduction in lead time will pay out massive, compounding benefits to your team and adjacent teams forever and ever. Let’s get you started!

Generally good advice

  • Instrument your build pipeline with spans and traces so you can see where all your time is going.
  • Order tests by time to execute and likelihood of failure.
  • Don’t run all tests, only tests affected by your change.
  • Similarly, reduce build scope; if you only change front-end code, only build/test/deploy the front end, and for heaven’s sake, don’t fuss with all the static asset generation.
  • Don’t hop regions or zones any more than you absolutely must.
  • Prune and expire tests regularly. Don’t wait for it to get Really Bad™.
  • Combine functionality of tests where possible. Tests need regular massages and refactors.
  • Pipeline, pipeline, pipeline tests… with care and intention.
  • You do not need multiple non-production environments in your CI/CD process. Push your artifacts to S3 and pull them down from production. Fight me on this.
  • Pull is preferable to push.
  • Set a time elapsed target for your team, and give it some maintenance any time it slips by 25%.

The usual suspects

  • Tests that take several seconds to init.
  • Setup/teardown of databases (HINT try ramdisks).
  • Importing test data, seeding databases, sometimes multiple times.
  • rsyncing sequentially.
  • rsyncing in parallel, all pulling from a single underprovisioned source.
  • Long Git pulls (e.g. cloning whole repo each time).
  • CI rot (e.g. large historical build logs).
  • Poor teardown (e.g. prior stuck builds still running, chewing CPU, or artifacts bloating over time).
  • Integration tests that spin up entire services (e.g. Elasticsearch).
  • npm install that takes 2-3 minutes.
  • Bundle install that takes 5 minutes.
  • Resource starvation of CI/CD system.
  • Not using a containerized build pipeline.

“Our Software” (constantly changes) vs “infrastructure” (rarely changes)

  • Running CloudFormation to set up new load balancers, dbs, etc. for an entire acceptance environment.
  • Docker pulls, image builds, Docker pushes, container spin up for tests.

“Does this really go here?”

  • Packaging large build artifacts into a different format for distribution.
  • Slow static source code analysis tools.
  • Trying to clone production data back to staging, or reset dbs between runs.
  • Launching temp infra of sibling services for end-to-end tests, running canaries.
  • Selenium and other UX tests, transpiling and bundling assets.

 “Have a seat and think about your life choices.”

  • Excessive number of dependencies.
  • Extreme legacy dependencies (things from the 90s).
  • Tests with “sleep” in them.
  • Entirely too large front ends that should be broken up into modules.

“We regret to remind you that most AWS calls operate at the pace of infrastructure, not software.”

  • Provisioning AWS CodeBuild. You might suffer 15 minutes of waiting for CodeBuild to do actual work.
  • Building a new AMI.
  • Using EBS.
  • Spinning up EC2 nodes sequentially.
  • Basically, cool it with the AWS calls.

Natural Born Opponents: “Just cache it” and “From the top!”

  • Builds install the correct version of the toolchain from scratch each time.
  • All builds rebuild the entire project from source.
  • There’s a failure to cache dependencies across runs (e.g. npm cache not set properly).

“Parallelization: the cause of, and solution to, all CI problems”

  • Shared test state, which prevents parallel testing due to flakiness and non-deterministic test results.
  • Not parallelizing tests.

Fifteen minutes

Hopefully these lists gave you a good starting point for where you can reduce lead time to deploy. Even if you don’t get to fifteen minutes, there’s always room for improvement—and like I said earlier, any improvement is better than none.

P.S. What did I say about instrumenting your build pipeline? For more on honeycomb + instrumentation, see this thread. Our free tier is incredibly generous, btw.

P.P.S. this blog post is the best thing I’ve ever read about reducing your build time.

 

Charity

 

Don’t forget to share!
Charity Majors

Charity Majors

CTO

Charity is an ops engineer and accidental startup founder at honeycomb.io. Before this she worked at Parse, Facebook, and Linden Lab on infrastructure and developer tools, and always seemed to wind up running the databases. She is the co-author of O’Reilly’s Database Reliability Engineering, and loves free speech, free software, and single malt scotch.

Related posts