Golang Observability: Using the New Pprof Web UI to Debug Memory Usage

Intro

Go, for good reason, has become an incredibly popular
language for everything from web applications to DevOps tools. We lean on Golang
heavily here at Honeycomb, and naturally we are
observability
nerds, so I got ridiculously excited to use the new
pprof web UI to gain more insight into the
performance of our Golang programs, both internal and open source.

 

Let’s talk about using the new web UI to solve an issue in one of our Go
programs, honeyelb.

Using New pprof UI

Recently we announced the new Honeycomb AWS Elastic Load Balancer
integration
.
This is a tool to process the otherwise opaque access logs Amazon gives
users to understand the request patterns that are hitting their load balancers.
You can now use our open source tool to send this information to Honeycomb for
later querying, allowing you to answer questions like “Which servers are
performing poorly?”
or “When are my applications returning a lot of errors?”.

Lucky for us, when we announced the tool our lovely customers jumped in and
started trying it out right away. In the process of doing so, one identified a memory usage issue: the Go runtime was running out of memory and killing the program!

Rather than guess and guess until I identified the source of the issue, I opted
to use the new pprof web UI mentioned above to try and track down hotspots.
The command line interface to pprof can be a bit unfriendly, but the team is
making great strides towards better usability. Let’s take a look at using this
to solve an issue.

Getting Started

To install the pprof tool with support for the web UI:

$ go get github.com/google/pprof

This provides you with a pprof binary you can use the -http flag with
to open an interactive program profile in your web browser. More on that in a
second. You may also need to install graphviz, which would be something like
sudo apt-get install -y graphviz on Debian-based Linuxes or brew install
graphviz
on OSX.

After installing pprof, I needed to actually get a profile to work with.
Luckily, the Go runtime makes this fairly straightforward with some of the
methods built in to the standard library. The honeyelb program is long-running
(intended to eventually be used as a set-and-forget daemon), so I added this
code into the init block of the main source file:

go func() {
  time.Sleep(120 * time.Second)
  memprof, err := os.Create("mem.pprof")
  if err != nil {
    logrus.Fatal(err)
  }
  pprof.WriteHeapProfile(memprof)
  memprof.Close()
}()

This code spins off a background goroutine (go func()…) which sleeps
for two minutes (to allow the program to begin running and accumulate some
memory usage for us to later take a look at), then writes a profile of that
memory to a file mem.pprof in the working directory where the program is
invoked.

Due to the solo colon, pprof should open a web browser automatically pointing
towards an arbitrary port on localhost (for instance, it chose :38687 on my
machine – you can also specify a port if desired) running the web UI.

Once booted up I saw something like this:

Generally speaking, the larger a box is, the higher the memory usage of
that section of the program is. You can click around in this graph, zoom in and
out, etc.

You can see that one box in the above jumps out as larger than the others: it
references the NewEvent method in libhoney-go, used to create a new “event”
(eventually rendered to JSON blob) to send to Honeycomb. OK, interesting, but
how about a numerical / tabular view? We can click View > Top in the upper
left hand corner to see this.

This shows us a list of the most memory-intensive functions in the program,
sorted by flat usage.

Looking at the numbers in the Flat % column, we can see that
runtime.hashGrow and runtime.mapassign (internal Golang methods related to
map usage) are clearly the most memory intensive.

So, we are clearly eating up memory in maps, and libhoney.NewEvent seems a
likely suspect for why. How can we verify this? One way is to highlight
the greedy methods like shown above, and click Refine > Focus to narrow down
our list of functions to the ones in the code path for this map usage (i.e., for
the highlighted methods).

Once I dug down into this level of detail, I did indeed confirm that
libhoney.NewEvent was a noteworthy source of increased memory usage. Knowing
that this was the greedy method, I started to reason about which code paths
would lead up to it. In doing so I realized that we were spinning up an
excessive number of Publisher structs (used to “publish” the events parsed
from the ELB access logs to Honeycomb) when we only needed one. Therefore I made
this
change

to only spin up one. I hadn’t noticed the memory inflation in my local testing,
but putting the program through its paces made the issue clear.

A new pass through the profile reveals that memory usage is now saner, and
runtime.makechan and bytes.makeSlice are our new foes in that regard.
However, pprof can help us understand that this seems related to JSON
marshaling and is something that’s likely a bigger task to fix (if it becomes a
problem that needs fixing at all).

Conclusion

Tools like this new pprof web UI really highlight the importance of an
“observability mindset”. It’s becoming increasingly difficult to reason about
what’s going on inside our systems, and tools like Honeycomb are emerging to
help combat the fog of war. By reducing our unknown unknowns to knowns (or at
least known unknowns), we can be prepared to fix issues proactively rather than
when things are on fire. And we can, of course, be a lot more prepared for
unforeseeable problems when they inevitably come up.

Many thanks to rakyll and the rest of the Golang
team for their great work! I’m looking forward to seeing more in this area such
as a built-in flame graph implementation.

Until next time, observe all the things! And please do sign
up
to try out Honeycomb if you haven’t already. 🙂