We’re excited to release
honeycomb-tcpagent, an efficient way to get query-level visibility into your MongoDB deployment.
honeycomb-tcpagent parses TCP traffic between MongoDB clients and servers, and reconstructs queries in a friendly JSON format. Honeycomb helps you explore this data to quickly uncover anomalies.
Are you running a database that’s not MongoDB? Let us know! Support for MySQL is already in the works.
Database Observability Means Lots of Questions About Lots of Data
For any serious database performance work, the ability to fully capture a workload is invaluable. The power to slice and dice by any criteria you want, doubly so. What’s the read vs write breakdown? Okay, how about for a specific collection? What’s the 99th percentile latency for a particular query family? Maybe network throughput seems high compared to query volume — are some queries returning too much data? Is a given server handling abnormally many queries? Is a particular client sending too many queries?
And so on.
Unfortunately, the aggregate statistics exposed by
db.serverStatus() and the like can only take you so far in answering these types of complex questions. Slow query logs are very useful, but often hide half the story: performance problems can be caused by many relatively fast queries, rather than a few slow ones. But on an I/O-bound database, full query logging tends to limit database throughput; enabling full query logging on an already-struggling database is akin to trying to put out a grease fire with water.
There Must Be Another Way
Another approach is to analyze actual network traffic between database clients and servers. By using a packet capturing mechanism to passively inspect MongoDB TCP traffic, it’s possible to reconstruct every request and response in real time. This strategy is very efficient, and doesn’t require any database reconfiguration or instrumentation in the application layer.
honeycomb-tcpagent does exactly this, and pipes structured JSON to stdout for our
honeytail connector to forward on:
honeycomb-tcpagent | honeytail -d "MongoDB" -p json -k $WRITEKEY -f -
Zero to Root Cause in 60,000 Milliseconds
For an example of what you might do with this data, imagine that MongoDB CPU usage is creeping upward, as shown in the graph below. Let’s find out why before the database runs into serious trouble.
We hypothesize that we’re handling more queries, or that individual queries are becoming more expensive — perhaps because of a missing index. So let’s look at overall query count and latency.
There are no clear trends in these aggregates. But it could be that while the bulk of our query workload is stable, a few queries are behaving badly — yet not enough to skew the aggregated metrics. Let’s break down latency by collection and normalized family:
There’s a lot of data here — can we discern any trends in individual series?
There we go. Looks like we have a new query pattern that’s slow and getting slower. Now we can go add the right index.
But Don’t Forget About Logs and Metrics Just Yet
TCP-level analysis is not a panacea. You still need system statistics and slow query logs for high-level instance metrics, and database-internal data — such as lock retention — that can’t be extracted from the TCP stream. Honeycomb helps you ingest server logs and statistics to get a complete picture of what’s going on.