Dogfooding  

Developing with OpenAI and Observability

By Jessica Kerr  |   Last modified on May 23, 2023

Honeycomb recently released our Query Assistant, which uses ChatGPT behind the scenes to build queries based on your natural language question. It's pretty cool.

While developing this feature, our team (including Tanya Romankova and Craig Atkinson) built tracing in from the start, and used it to get the feature working smoothly.

Here's an example. This trace shows a Query Assistant call that took 14 seconds. Is ChatGPT that slow? Our traces can tell us!

The API call to ChatGPT is called "openai.ChatCompletion" and it took 840ms. What happened during the 12+ seconds right before that? We can't tell!

Our Query Assistant API call took 14.3s, almost all inside GenerateQueryFromPrompt. The call to ChatGPT, represented by openai.ChatCompletion, took only 842ms. Before that, there’s a big gap with no spans, maybe 12s of unattributed time.
Our Query Assistant API call took 14.3s, almost all inside GenerateQueryFromPrompt. The call to ChatGPT, represented by openai.ChatCompletion, took only 842ms. Before that, there’s a big gap with no spans, maybe 12s of unattributed time.

So Craig and Tanya added some instrumentation. They created spans representing important units of work: constructing the prompt, and as part of that, truncating the list of available fields we send as part of the prompt. Now we can see what’s happening!

The spans reveal that creating the chat prompt takes 15.4s, and 14.8s of that is spent on TruncateColumnlist. For comparison, openai.ChatCompletion shows that the call to ChatGPT took only 836ms. Most of the Query Assistant latency is due to TruncateColumnList!
The spans reveal that creating the chat prompt takes 15.4s, and 14.8s of that is spent on TruncateColumnlist. For comparison, openai.ChatCompletion shows that the call to ChatGPT took only 836ms. Most of the Query Assistant latency is due to TruncateColumnList!

To truncate the column list, we call a library that counts tokens in the prompt. With traces that show how long it’s taking, Tanya and Craig tried various optimizations until they landed on one that was close enough on the token count—and much, much faster. Here’s the trace from a query I ran today:

The TruncateColumnList span is now only 6ms! The ChatCompletion takes 3.3s. Now this column list calculation is insignificant to request latency.
The TruncateColumnList span is now only 6ms! The ChatCompletion takes 3.3s. Now this column list calculation is insignificant to request latency.

This is observability during development: see what you’re doing, make the feature better, and keep that same visibility in production.

Want to learn more? Read the announcement on Query Assistant, and try it yourself by signing up for Honeycomb today.

 

Related Posts

Technical Deep Dives   Dogfooding  

Scaling Ingest With Ingest Telemetry

With the introduction of Environments & Services, we’ve seen a dramatic increase in the creation of new datasets. These new datasets are smaller than ones...

Featured   Dogfooding  

The Present and Future of Arm and AWS Graviton at Honeycomb

As many of you may have read, Amazon has released C7g instances powered by the highly anticipated AWS Graviton3 Processors. As we shared at re:Invent...

Dogfooding  

Tale of the Beagle (Or It Doesn’t Scale—Except When It Does)

If there’s one thing folks working in internet services love saying, it’s: Yeah, sure, but that won’t scale. It’s an easy complaint to make, but...