Sense and SignalsBy Nick Travaglini | Last modified on September 27, 2022
“What characterizes [complex dynamical systems] is that averages don’t cut it. It’s the individual path that this particular dynamic has traversed that produces the unique properties of that particular [...] whatever.”
Complex, distributed software systems are chatty things. Because there are many components interoperating amongst themselves and with things outside their bounds like users, those components and the systems themselves emit many information signals. It’s the goal of monitoring, logging, and observability (o11y) tools to help the systems’ “stewards,” those developers and operators tasked with maintaining and supporting them, make sense of those signals.
We at Honeycomb advocate that those stewards treat the signals emitted by their systems as structured events. This is a key differentiator from historic approaches to understanding the systems’ state, such as creating pre-aggregated metrics or outputting unstructured logs. In this post, I’ll argue that utilizing events instead of those other methods is preferable because they provide more information, and can therefore enable better stewardship of those software systems.
What are signals?
First, we need to define “information signal.” For our purposes, an information signal is a formal structure consisting of potentially many attributes, where each of those attributes may have many values. The more values that an attribute can have, the greater the “cardinality” of that attribute. In fact, each signal becomes more unique and distinguished as more attributes are added and the potential set of values per attribute increases. In other words, each signal becomes more informative.
To start, we’ll consider one of the historical methods for understanding a system’s state: creating pre-aggregated metrics. The production of pre-aggregated metrics means using a tool to define a set of important attributes, and then programming the tool to aggregate them in some meaningful way. An example of this would be reporting the number of HTTP response codes of the type 200 produced by a frontend server over a given time period.
This approach has the advantage of producing a relatively compressed and simple description of a behavior of the system: over X period of time, the system did Y about Z times. It also assumes that this is unproblematic: decomposing the signal into the predetermined attributes, analyzing each attribute independently, and then piecing together the results of those analyses should produce at least an equivalent amount of information. My colleague Jessitron wrote about the trouble with the decomposition and analysis portions. My focus is on the final part, of assuming that there is an equivalent amount of information after that process is said and done.
Observability tools, on the other hand, treat each signal as a coherent, individual structure. When each signal is treated as a single structure, it keeps all of the attributes together—and considering this as an aggregate of attributes is indeed informationally equivalent to the decomposed lump, which serves as raw materials for metrics as described above. But something crucial gets lost if a signal is treated as just an aggregate: the relationship between the attributes.
The fact that all of those attributes with their values are in relation with one another is itself informative. The weave of those relations constitutes what I call “information density,” and that density informs the steward that the signal was produced in particular circumstances “below the line.” In other words, these machines, which were programmed by those people in such and such a way, ordinally interoperated and emitted this signal at that time. Information density is a symbol of the context which produced that one unique signal.
What’s information density?
Information density is what gives a signal its ‘heft,’ and as a symbol can serve as a channel for the steward to work backwards from the signal to empirically investigate the functions of the technical components which produced it. Conducting that investigation is the practice of observability.
Decomposing a signal destroys the internal relations which are information density, and hence, something distinctly informative for the steward. Therefore, in order to attain the maximal utility from each signal, it’s better to treat them as individuals—or as we at Honeycomb say, events.
Part of understanding a complex, distributed software system as a socio-technical system means taking seriously that the signals the stewards receive aren’t just chatter. People understand what their system is doing by learning about its activity, and observability tools like Honeycomb help understand that activity and its originating context in the most effective way for those responsible for sustaining it.
If you want to give Honeycomb a try, sign up for free to get started.
Intercom’s mission is to build better communication between businesses and their customers. With that in mind, they began their journey away from metrics alone and...
In the last few years, the usage of databases that charge by request, query, or insert—rather than by provisioned compute infrastructure (e.g., CPU, RAM, etc.)—has...
As long as humans have written software, we’ve needed to understand why our expectations (the logic we thought we wrote) don’t match reality (the logic...