Tracing  

Tracing Makes Concurrency Bugs Easy to Spot

By Jessica Kerr  |   Last modified on October 3, 2022

Today, I found a concurrency bug before I noticed it. Like, it was subtle, and so I wasn’t quite sure I saw it—maybe I hadn’t hit refresh yet? Later, I looked at the trace of my function and, boom, there was a clear bug.

What is a concurrency bug?

Some pieces of code can run at the same time–concurrently–while other operations need to run one and then the other–sequentially. When concurrent operations lead to unpredictable results, that’s a concurrency bug. For instance, when we apply changes and save at the same time, which changes get saved? It could be all of them, or some, or none, depending on when the save happened. Concurrency bugs are tricky, because maybe the tests pass sometimes. Maybe the code behaves differently in production than locally. They’re hard to pin down, and therefore hard to fix.

Concurrency bug example

Here’s the function with the bug. It responds to a request to /win by saving a record of the win and returning the total of my winnings so far. Can you spot the problem in the TypeScript?

public respondToWin = async (req: express.Request, res: express.Response) => {

  const username = "jessitron";
  const record = new Winnings(username, 1);
  this.winRepository.save(record);

  const { total_winnings } = await this.winRepository
    .createQueryBuilder("wins")
    .select("SUM(winnings)", "total_winnings")
    .where("wins.username = :username", { username })
    .getRawOne();

  res.send("Total so far:" + total_winnings);
}

It’s subtle.

Now here’s a trace in Honeycomb:

example of troubleshooting a concurrency bug

Now do you see the bug?

Hint: The problem is that the total_winnings returned does not include the winnings represented by this call.

In the trace, I can see that the select ran concurrently with the save.

Ohhh, so the result of the select definitely is not including the record I just inserted.

In the code, the bug is an absence before this line:

this.winRepository.save(record);

I need to await the promise returned by save. Here’s the fix:

Now I remember that the first time I tested /win, I got back zero total winnings the first time. But then on refresh, it went up and kept going up. I thought maybe I saw that first one wrong—nope, it was a sneaky bug.

This one is sneaky because it’s a problem with something that is not there. It takes an expert to notice a dangerous absence. Meanwhile, in the trace, I saw something that was there: the select running at the same time as the save. If you want to get better at concurrent programming, this is the kind of tool that builds your expertise.

(For the record, this trace is created by auto-instrumentation for Express, TypeORM, and Postgres. It’s generic configuration that works in any Node app with these libraries.)

Tracing with Honeycomb: try it yourself

Spans in a trace are great for showing you when stuff happened in relation to other stuff. Use them to check your asynchronous code. Include tracing in your development process!

You can try out tracing for yourself in Honeycomb—sign up for a free account!


This post was first published on Jessitron’s (aka Jessica Kerr) blog https://jessitron.com/blog/.

 

Related Posts

Tracing   Customer Stories  

Start with Traces, not with Logs: How Honeycomb Helped Massdriver Reduce Alert Fatigue

Before Massdriver, Dave worked in product engineering where he was constantly bogged down with DevOps toil. He spent his time doing everything except what he...

Tracing   OpenTelemetry  

Infinite Retention with OpenTelemetry and Honeycomb 

Honeycomb is massively powerful at delivering detailed answers from the last several weeks of system telemetry within seconds. It keeps you in the flow state...

Tracing  

Trace Propagation and Public API Endpoints in .NET: Part 1 (Disable All)

One of the issues with the W3C trace context is that it doesn’t define any standards for how far a trace is to propagate. If...