Observability   Logging   Instrumentation   Events  

How Are Structured Logs Different From Events?

By Ben Hartshorne  |   Last modified on September 8, 2023

We're all collectively trying to define observability ("o11y," pronounced "olly") these days, and, as Honeycomb is sometimes described as an event-based observability product, trying to define all the other words that go around o11y at the same time.

hex-shaped log storage bins

One of the questions I've seen come up several times in different forums recently is around the relationship between a structured log and an event. There are many definitions of an event (and many people trying to create a standard definition), so in the tradition of developing standards, here's one more.

So what is the difference between an event and a structured log?

All events can be represented as structured logs, but not all structured logs are events.

Ok, before we get in to that statement in more detail, let's do a few more easy definitions.

  • Log: a collection of messages. Often written to disk, sometimes streamed somewhere else before getting written to disk. Usually line-delimited, unless you're in Java land. Usually come with a timestamp attached. Individual messages may or may not be related to each other.
    • example message: Jun 20 09:51:47 cobbler com.apple.xpc.launchd[1] (com.apple.preference.displays.MirrorDisplays): Service only ran for 0 seconds. Pushing respawn out by 10 seconds.
  • Structured Log: Has all the characteristics of Log above. The content of the message is represented using a structured format of keys and values instead of natural language. May be structured in a variety of ways: positional fields (eg apache access log), JSON, key=val, key:val, bson, etc. etc.
    • example, translating the above message to a structured log using JSON: {"time":"Jun 20 09:51:47","hostname":"cobbler","process":"com.apple.xpc.launchd","pid":1,"service":
      "com.apple.preference.displays.MirrorDisplays","action":
      "respawn","respawn_delay_sec":10,"reason":"service stopped early","service_runtime_sec":0}

Now on to the tricky one - what is an event? My favorite definition is pretty squishy:

An event contains information about what it took for a service to perform a unit of work.

What's a unit of work?

Let's start with the “unit of work” part. Treating an event as a unit of work lets you adjust what it means depending on the goals of the observer. Sometimes a unit of work is downloading a single file, parsing it, and extracting specific pieces of information, but sometimes it's getting an answer out of dozens of files. Sometimes a unit of work is accepting an HTTP request and doing everything necessary to hand back a response, but sometimes one HTTP request can generate many events. As an operator of a service, my definition of unit of work has changed as I have needed to understand different parts of the service, and the instrumentation I have in that service has changed to accommodate those needs. It is a fluid relationship, zooming in on troubled areas, and back out again to understand the overall behavior of the entire service.

Ok, how about the “contains information” part? I would like to see an event have everything about what it took to perform that unit of work. This means it should record the input necessary to perform the work, attributes computed or resolved or discovered along the way, the conditions of the service as it was performing the work, and finally some details on the result of that work. Most of the time we can't include all that all the time for everything (eg I wouldn't expect the entire rendered web page to be sent along to my event-observability framework) because it is usually not in the interests of the business to maintain copies of all that data. (Sometimes it is! Needs vary.) Using an HTTP request / response as an example, I would include

  • information about the requestor: client IP address, request headers like user-agent, etc.
  • information about the request: URL or path, query parameters, POST body size, etc.
  • information about the service: current load, memory used, number of in-flight requests, etc.
  • information gleaned while handling the request: users successfully authenticated, timers around dependent services, etc.
  • information about the response: status code, response size, duration to transmit, etc.

Web server access logs are actually one of my favorite examples of events that are written as structured logs:

  • log: usually weblogs wind up as files on disk, but sometimes are pipes or streamed elsewhere
  • structured: though they're not in JSON there is a strict structure defined by positional fields with known delimiters.
  • event: each line includes a slew of information about what it took to handle that request. The unit of work is obvious: one request / response transaction, and there is information about all parts of that transaction present.

So I think we've now covered the “All events can be represented as structured logs” portion of that quote up at the top. What about the other half?

Why aren't structured logs the same as events?

Long ago when the only tool we had was printf debugging, we established some patterns of code. When you get to an interesting part of the codebase, emit a line describing something about where you are and (mostly) the fact that you got there. “Pushing respawn out by 10 seconds” is a great example. The launch daemon is letting you know that it reached a part of the code where it's rescheduling the next launch of this service to 10 seconds in the future. Does this represent a unit of work? I suppose at the lowest level of every function call being work, yes, but as an operator of a service, is it a usefully sized unit of work?

The questions I usually ask of my launch daemons are slightly higher level. “How long did it take to launch this service?” “How many times did it fail before you gave up?” Neither of those are answered by a single respawn, but they would be answered by a single “launch attempt”. I think, were I trying to build a launcher service, that my approach would be to treat a launch attempt as the unit of work, knowing that it has retries and a timeout. The event might be something like “tried to launch foo and failed after 2 minutes (30 attempts); last exit code was 126 and it sent 'permission denied' to STDERR” (of course, it would be a collection of key/val pairs, but right now my audience is you, blog post reader, not a machine, so consider it reformatted.)

Logs are often just portions of events

The key here is that many logs are only portions of events, regardless of whether those logs are structured. It's not at all uncommon to see 5-30 logs which, together, represent what could usefully be considered one unit of work. I have seen my fair share of logs representing a single HTTP transaction that go something like

6:01:00 accepted connection on port 80 from 10.0.0.3:63349
6:01:03 basic authentication accepted for user foo
6:01:15 processing request for /super/slow/server
6:01:18 request succeeded, sent response code 200
6:01:19 closed connection to 10.0.0.3:63349

The information contained in that block of lines, that collection of log messages, is all related to a single event: the handling of that one connection. But rather than being helpfully grouped, it's spread out into many log messages. (For the sake of argument, feel free to rewrite each of those lines as a structured log.) Sometimes there's a request ID that lets you put them back together. More often there isn't, and you have to sort through PIDs and remote ports, and sometimes there's just no way to put all the pieces back together.

An event is a conceptual abstraction and a structured log is one possible representation of that abstraction. The interesting part of the conversation is where to draw the lines around that abstraction; the technical implementation part is how to represent that event.

Thanks for listening! Consider describing your own definitions for these terms to help us, as a community, get a better handle on how to talk about these things!

And if you want to try out a fantastic service for visualizing events (regardless of whether they're ever written as structured logs), sign up for a free Honeycomb account and give it a try.

 

Related Posts

Observability  

Frontend Debugging Is Bad and it Should Feel Bad

There’s a sentence that strikes fear into the heart of every frontend developer I've ever met: Users are reporting issues, and we don't know how...

OpenTelemetry   Instrumentation  

OpenTelemetry Best Practices #2 Agents, Sidecars, Collectors, Coded Instrumentation

For years, we’ve been installing what vendors have referred to as “agents” that reach into our applications and pull out useful telemetry information from them....

Observability   News & Announcements  

Focused Labs & Honeycomb: Better Together

We're excited to unveil a new collaboration with Focused Labs, a leap forward in our shared commitment to advancing modern observability practices and enhancing the...