Honeycomb Data Concepts

 

+ Transcript:

Alayshia Knighten [Sr. Implementation Engineer]:

Hello everyone. My name’s Alayshia Knighten, and today we will be discussing Honeycomb Data Concepts. Now let’s beeline in.

Okay, let’s start with the big picture on how data flows. On the left side of the diagram, there’s your application. In your application, you add the language-specific Beeline, or opentelemetry, or any telemetry library to the application. In the case of Honeycomb’s Beeline, there is a write key and a dataset that needs to be provided. Generally, once deployed it will immediately start running an auto-instrument everything it can to send that data back to Honeycomb. There are a few things to remember. If you don’t provide a write key and a dataset in the configuration, then Beeline goes into no-op mode. In no-op mode it is still capturing all of the events that it normally would, but it sends those events into Bitbucket. This is useful if you are testing, and in a testing context, and want to verify that Honeycomb is being called, but you don’t necessarily want to send your traces to that CI.

Beelines, and all tracing libraries in general, are designed never to block your application from server responses. We run a background thread that asynchronously sends data in batches back to Honeycomb. Although this does add a small amount of processing overhead, we go to great lengths to ensure that there is virtually no latency being added onto your app. If you ever do see an increase in latency from a tracing library to connect to Honeycomb, that is definitely something to investigate and to talk to us about, as we want to minimize as much impact as possible on your application, about auto-instrumentation. It’ll tell you a ton about what your application is doing. Typically, the first time someone sees that trace waterfall, it really blows their minds because it’s more of a feeling like, “My app is doing what? Oh my goodness.” About auto-instrumentation. It provides a skeleton of what your application is doing from a context of well-known libraries we know about and can watch for events like HTTP handlers, database clients, and things like that.

The real business value happens when you begin doing explicit instrumentation to your application. This can be achieved by providing additional context to your spans. This can be business context, like using metadata. For example, what is a user ID? What is the user name? The company that the user works for, or geological locations, if that’s your thing. Additional context can also come from technical metadata. If you’re using AWS, for example, you may be interested in which AWS region the instance is running, its instance type and any other pieces of information that you may find interesting. Last but not least, context can also come from the application itself. Application metadata that may be useful are the build ID, feature flags and runtime dependencies. The reality is that you can completely go nuts with this and send whatever you need to send. There is no additional cost for sending more fields with each span, and you can send up to 2,000 fields per event without impacting performance.

The other value add is instrumenting things that are not captured by auto-instrumentation. Traces are typically looked at in a waterfall view. Sometimes you may see two spans that have a tiny gap. If it’s nanoseconds, okay cool. Who cares? But what if it’s many milliseconds, or you simply find yourself saying, “What’s going on here between these two things?” Well my friend, there is something that was not captured by auto-instrumentation, that may have been important. So what is the missing thing? It could be a third-party API, or a CPU, our intensive actions, talking caches. It’s really anything that can add latency to your application. Those things are important things to identify in your code and place a wrapper around.

As you can tell, there’s definitely more to discuss regarding instrumentation. Please check out Honeycomb docs. If you ever need assistance in instrumenting your application, we have an awesome community of pollinators buzzing around in Slack, and there are specific channels for the languages that we support. You can also grab office hours from someone from Honeycomb, and our developer advocates enjoy assisting customers and walking through instrumentation examples, and working with you on your code. We also have a number of agents for integrating with things like AWS load balancers, RDS, Lambda, as well as Kubernetes. Last but not least, we have Honeytail. Honeytail, is this boss Swiss Army knife tool that can inject and ingest structured logs from any file, and has parsers for common file formats. When all else fails, you can definitely use Honeytail. And yes, I have the tendency to capitalize on moments in which I can rhyme.

05:08

Looking at the diagram, we have set up our app-specific instrumentation. When the data flows out of the application, it can then optionally go through one of Honeycomb’s proxies. We have two proxies that I will discuss. Previously referred to as samproxy, Refinery is a trace-aware dynamic sampling proxy that helps to decrease noise when you have a very high volume of types of traces and you do not necessarily need all of them, but you definitely may want to keep the traces that are in error or take longer than X-amount of time. Then there is Secure Tenancy that is available to all enterprise customers. Secure Tenancy can encrypt and decrypt all string values that live inside of your data center RBPC. They are not stored in Honeycomb, so customers that typically deal with compliance will use Secure Tenancy.

Moving away from the optional proxy, data is transmitted to api.honeycomb.io, which is the ingestion service. The data is processed and stored in our data store. Finally, you would go to the UI, which is ui.honeycomb.io to run all of your queries. So there you have it. From a quick overview perspective, you set up your application with app-specific instrumentation. If needed, the data then goes to a proxy where it is then sent to the ingestion service. It is processed and stored, and is available for you to access and run your queries from the UI. All right, let’s get into datasets and events. In the last slide we talked about how events flow into the system. “But Alayshia, what are events?” Well, Honeycomb events are adjacent objects that if you squint at them hard enough, they look like structured logs. Still, they must contain a few key fields so that we can handle them properly.

The first field it must contain is the timestamp. Okay, so if you don’t provide a timestamp, we will append our own timestamp at the time of receiving the event, but it is useful to have your own timestamp for your own event tracking. An event should also have a duration, service name and sample rate to be useful inside of Honeycomb. Events can contain up to 2,000 key value pairs, and those values can be a string, integer, float, or boolean. It can even be a JSON object and we can do an automatic unpacking of that JSON on our side. Events represent one unit of work within your system. In a tracing context, that’s broken down inside of your user-facing requests into mini events. For example, there is a unit of work for handling HTTP connection, performing a database request, calling a third-party API, et cetera. But the unit of work is something that’s quite subjective. It’s something that really has to make sense for you in your context. The question at this point would be, should each single function in your application be treated as a unit of work that deserves this span? Probably not, and almost definitely not. But the things that are performance intensive or potential bottlenecks, those are the things that you may want to put spans around.

Let’s talk about code for a second. When committing code comments are important as they provide information regarding the intent of the code. Just like putting a comment, you will start a span here, add these fields to the span to express the state of what may be happening there, and send all of that to Honeycomb. Then you can later analyze it and do something cool with it, but that’s a lot of detail. The good news is that auto-instrumentation takes care of the heavy lifting for you. We also encourage that you build your own wrappers around the Beeline library so that you can, for example, standardize certain fields and not have to make developers do all of that work every time, for every app. Less work is the best work, right? We organize all of our events into datasets. What’s cool about datasets is that it will automatically add fields and make assertions for the type when we first see it, so you don’t. Datasets have a limit of 10,000 unique fields or field names.

I think I have you sweating at this point because we’re about to talk about dataset planning. Not to worry though, dataset planning is not something that you have to worry about as much, as it is much simpler than it used to be. All datasets have a fixed 60 days of retention. If you ever have a dataset that might have some sort of compliance requirements, that needs either less or more than 60 days, talk to us, we can do things that can help you with that. Datasets are also automatically created upon receiving an event for the first time, with a new dataset name. Now you have certain kinds of API keys that can create new datasets. You can also modify the permission of API tokens so that they don’t have the ability to create a dataset. And if they don’t have the ability to create a dataset, and you send a new dataset anyway, well you’ll get a 403 error from us.

You can also create datasets in the UI, which is totally fine as well. Our general guidance is to create datasets per environment and schema. If you think about traces in general, just as being a schema, then that’s completely fine. That is what we recommend for most folks, because if you split a trace or trace spans across multiple datasets, then you won’t be able to see that today in a combined view. As a result, most companies will have production traces and staging traces. It is definitely important to separate out development environments.

10:21

This allows you to separate events to only fire alarms that are important to the people that they are important to. No-one wants to wake up in the middle of the night for a development alarm. In terms of logging schemas like RDS logs, NGINX logs, access logs, CloudFront, or any other kind of CDN, those types of things should go into separate datasets. In May 2020, we rolled out event-based usage plans and monitoring, which should be much more useful for keeping tabs on your limits. We reconfigured the page under team settings, usage, to be significantly more useful. There’s also the Mood Ring, which gives you a quicker indicator of your monthly usage. Orange means that you were over last month. Red means that you’ve been over your limit two months in a row, both free and pro tier teams can get throttled if they’ve been over for two consecutive months plus 10 days.

Throttling means we force all of your events to a sample rate of 10. Only one out of 10 events are kept. For enterprise teams, we’ve disabled all throttling. We’ll still send you an email about being over limit, or when rate limiting happened. Rate limiting is a mechanism used to protect Honeycomb’s infrastructure from overload, and its default rate is 7,000 events per second. Allowing single-second bursts up to 10-times that value. We routinely raise this for our enterprise teams. The new graph shows you a daily event target, which is your monthly limit divided into days. Finally, burst protection gives you a free pass on any days, up to three, where you’re over your daily event target. Once again, I’m Alayshia Knighten with Honeycomb Data Concepts. As always, go beeline in.

If you see any typos in this text or have any questions, reach out to marketing@honeycomb.io.