Getting Data in With Honeytail
Parsing structure out of logs

 

+ Transcript:

Nathan LeClaire [Sales Engineer|Honeycomb]:

Welcome to another episode of Honeycomb Training. I’m Nathan LeClaire and today we’re going to talk about Getting Data In With Honeytail. Honeytail is a very useful go binary that can be used to ingest structured logs that you have sitting around as well as parse structure out of existing logs. And we’re going to take a look at what that looks like today. So before I get into the nitty-gritty of using Honeytail, I want to say Honeycomb is not a logging tool. It’s not a tool that’s meant to ingest a lot of logs, especially if they don’t have any structure, but often logs do have a useful structure that we can take advantage of and parse out to send to Honeycomb as structured events. So, for instance, you might be very familiar with seeing logs like this, that have a timestamp, and they do embed some structured information. Like they might say, “Oh, a particular endpoint was being accessed by a particular user.” But our end goal is we want to actually parse that out into a structured event as you see here.

So we’ll pull all of those fields out for querying later on. That’ll make it easier to answer questions like, why is this one endpoint slow for this particular user, which client IPS are accessing our system, and many, many more. So the way that Honeytail works is it’s named Honeytail because of its similarity to the Unix tail command that you might be familiar with. Some process, like for instance, I just have NGINX illustrated here, will write to a log file, like the access log, and Honeytail will be watching that file and tailing it for changes. So as new lines come in, Honeytail with a particular parser configured, will parse out the structure in that log and send things along the Honeycomb for us to query and access like you can see in the upper right-hand side there.

Honeytail has a variety of logging parsers available. We’ll go into a couple of examples of these here, but just for reference, these are the ones that it supports out of the box. And if you need something really flexible, sometimes you can lean on a parser like the Regex parser, or you might just want to dump a bunch of JSON that you have sitting around in using the JSON parser. Today, we’re going to take a look at two examples, and the first will be using the NGINX access logs to send along to Honeycomb. So you can see that there’s a little bit of setup work here. You have to define a log format in your NGINX file or use the existing one that you have and pass it along to Honeytail as a format that it knows how to parse out. So, behind the scenes, everything in Honeytail will be converted into a Regex and Honeytail will try to match that access log pattern that you’ve specified as an option to the log lines that it’s seeing for parsing.

So when we invoke the Honeytail binary, we give it a couple of parameters. So you can see here, I have the debug flag turned on so that we can actually see all the things that are going on. We’ve set the parser to NGINX, the right key to our Honeycomb right key from our team settings. The dataset can be called anything. I’ve just called it examples.honeytail-NGINX here. And then the really key parameters, we want to specify the NGINX config file that we should read the log format from and we also want to say that the name of that format is Honeytail. So coming back to our config that already existed in NGINX, you can see that this log format is called Honeytail here, and if you want to ingest logs from some other access system, like Apache or HAProxy, you can just define a little config file that’s minimal, that’s not actually used for NGINX config, but just has this log format directive in there.

So we’re going to tell Honeytail which NGINX format directive to use and we’re also going to say which file we want to tail to ingest the logs in from. And you can see an example of that running in motion here, where we’re accessing an API endpoint that goes through an intermediate NGINX hop first. So we proxy pass this off to a backend API, and as requests come into our system, you can see the output of the Honeytail logs here I have running in a little Docker compose file. Honeytail will parse out those lines, so as they’re written to the NGINX access log, we will parse structure out and send them along to Honeycomb. And then when we go to the Honeycomb UI to investigate that data, we notice that they have all these lovely structured properties to query on.

 05:01

So, for instance, we can look at a heat map of the request time, we can BubbleUp on that heat map to try and explain why some things are slow, and here we can see that we identify the path behind those slow requests, for instance. So that’s a basic example of using Honeytail with an access log. So another really, really cool use case is at Honeytail can ingest MySQL slow query logs. So if you set MySQL to log slow queries or even just all queries, you can ingest that slow query log into Honeycomb using Honeytail and you can use that for analysis of your database. So the invocation to ingest MySQL is pretty similar to what we were looking at with NGINX. A lot of those first properties are really similar except for the parser, but there are a couple of other things to take a look at here.

So one is that I have this minus, minus add field to set a static field on the events that we’re sending. That might be useful if we have Honeytail, say, running on a couple of different hosts and we want to specify which host we’re sending in logs from. We, of course, have the file that Honeytail will tail to get new data, and we can also add on a flag for minus, minus drop field if there’s any data we want to just drop or scrub completely. So Honeytail will get normalized MySQL queries as a field, but the raw MySQL queries are also in that data and we might want to drop that field called query because that might have sensitive information in it and other properties that we don’t want to send to Honeycomb. So drop field and scrub field are good to know about. And here’s what that looks like in motion here.

So you can see that as we’re accessing MySQL, Honeytail is chugging along and following along with all of those logs and sending the events to Honeycomb successfully. So we’re parsing structured data out of these MySQL logs. Then we can come over to Honeycomb UI and do things like group by normalized query, looking at account or other aggregates like a 95th percentile of latency. And here we can see these groupings for the normalized queries. So this can be really useful and powerful. One pattern we might want to look at, for instance, is a sum of the duration or the lock time spent in each query. So being able to group by the whole query is really, really handy, and there’s a lot of interesting nuances to the MySQL parser. It will also understand the certain type of comments for additional metadata and so on and so forth.

So sometimes you just don’t really have any choice. You can’t instrument the code to get into certain systems so you do have to fall back on parsing data out of logs to get full observability, and Honeytail is a great tool for using those kinds of black box logs for analysis. One thing I do want to call out, the agentless integrations for AWS is a similar idea to what Honeytail does. It’s AWS specific, but for some things like, say, ELB access logs, you might want to use the agentless integrations because we give you a little Lambda to install and it will do a similar operation to what Honeytail does but without requiring you to run any agents or things on your own infrastructure. Honeytail, of course, has to manage state to know how far it’s read in the log file, and sometimes it’s just nice to hand those things off to some other serverless system if we can.

So that’s a little bit of training on getting data in with Honeytail. Using Honeytail, you can take in all kinds of text files and turn them into structured data. And so I hope you go forth and have fun with Honeytail. Go forth and instrument your code. Happy observability.

If you see any typos in this text or have any questions, reach out to marketing@honeycomb.io.