Conference Talk

Full Observability: From Push to Production

June 25, 2020

 

Transcript

Taryn Jones [Marketing Events & Partnerships Manager|Codefresh]:

All right, let’s go ahead and get started. Hello everyone, and thank you so much for joining us for today’s webinar, Full Observability: From Push to Production. Our presenter today is Pierre Tessier, a Sales Engineer for Honeycomb, and he’ll be joined later on by Kostis Kapelonis, DevOps Evangelist for Code Fresh. Today Pierre will help you make informed decisions, spending less time fixing and more time improving, with the help of observability.

But first I want to go over a couple of housekeeping notes before we get started. We encourage your questions throughout the session, just remember to please submit them using the Q&A button on your Zoom toolbar, rather than in the chat, so we can keep better track of them. And then we will address all of your questions at the end of the presentation. This session is being recorded, and a link will be sent to you by tomorrow, with the recording and slide deck. So don’t worry if you have to miss some of the presentation. Lastly, please remember to reference codefresh.io/events for all of our upcoming webinars, as we have fresh and informative webinars for you several times a month. With that, I will hand it over to Pierre to kick off the presentation. Hi Pierre.

Pierre Tessier [Sales Engineer|Honeycomb]:

All right, thank you very much. Hello everybody. Today we’re going to be talking about doing full observability, going from push into production. My name, I’m Pierre Tessier, I’m a Sales Engineer over at Honeycomb, but please don’t let the Sales Engineering title fool you. I really am an engineer. I love to dabble in code, do a lot of home things here as well. I’m all about understanding observability and helping others get it as well. With me, I am joined by Kostis. I’ll let him go ahead and introduce himself.

Kostis Kapelonis [Developer Advocate|Codefresh]:

Hey everybody. I’m Kostis, I’m a Developer Advocate for Codefresh. But before I joined Codefresh I was also a Java developer for 10 years. So I have seen some of the pains of continuous delivery and continuous deployment first-hand. I think Codefresh is one of a few solutions that can help you with this.

Pierre Tessier:

Awesome. Thank you. So with that, let’s get started here. I promise I’m just going to show a couple of slides. We’re going to spend a lot of time going over products, showing some examples and how we can help make everybody’s lives better. But let’s talk about why we’re here and the reason why we’re even in this session together. It’s really the journey of what has happened with everything we’ve gotten, and as we continue to build better systems, as we continue to adopt these methodologies which allow us to build faster and more rapidly, we’re also introducing a lot of complexity all the way throughout.

So modern systems are great, they allow us to be very agile, we can write quick little fixes and deploy them, but all these things increase complexity and it’s harder and harder to understand. It’s harder to troubleshoot. It’s harder to just know what is going on inside. So as we went from easy to get started, now we move to SAS, now you’ve got dependencies on a SAS vendor. We went from virtual machines into containers, now you’ve got to orchestrate this all, and it’s kind of Kubernetes, and I’m sure a lot of people, either you love or you hate, or you both Kubernetes. And even the applications themselves, we went from monoliths into distributed systems, and we’ve got all these microservices everywhere. Connecting them all and understanding the dependencies between services is equally as important. So when something happens, it’s hard to understand where it’s happening.

Now, when we talk about going from that complex journey into elite software development teams, what makes a team elite? What makes the engineering of organization A, B, C software company better than another? There’s been a lot of studies on this, I’m going to cite the DORA, State of the DevOps report, and really it comes down to those who can adapt and who build and do it a lot. Those who deploy multiple times per day, on-demand. Something happened, you’re not scared to go out and fix a code and hit that button, and have it show up in production a few minutes later.

Their lead time for changes are typically less than an hour. They don’t need to go through major change management to make this happen. And because they have the right tooling, because they have the right observability and the right understanding of what is going on inside of those applications, their time to restore is typically less than an hour as well. So when they find an issue, and they understand what it is, their time to that resolution is much better than others. And importantly as well, their failure rate from changes is also very, very low. Because it’s automated, because it’s understood, and because it’s observed. Because we understand and we can see what’s going on inside of it.

5:19

Honeycomb, we may call this the Honeycomb difference, why Honeycomb? Try to break it down really into four different categories if you will, four different key differentiators that really make us up. The first one is about our real-time exploration. The way that you can iterate and create that core analysis loop we call it, where you can ask a question and take the answers from your question and use those to ask another question, and do this in real-time, on that raw data.

Honeycomb has built-in SLO’s, built-in error budget. It’s part of the platform. As these systems get very complicated, we might get, for lack of a better term, really kind of trigger happy. We’re going to go off and we’re going to create a lot of these alerts. We’re going to create a lot of known things, and next thing you know we’re inundated with noise. Systems going bad, systems about to enter improper states. Managing an SLO, and understanding whether or not you are burning through that agreement you’ve made with your customers, customers being internal or external, and managing towards that, managing towards ensuring you’re not changing your business outcomes, allows you to not worry about the noise and only focus on what matters. Honeycomb does this built into the product, and we’re happy to dive into that in greater detail with you as well, to help your organization learn how error budgeting and SLO’s can help you be better.

Our way of curating and collaboration is an equally important side of Honeycomb. We’re not just a tool, we’re a tool for your team. We’re a tool for you to understand how your team is using the tool, as well as to work as a team to be better together. Ultimately we have a lot of functionality around allowing you to see how your colleagues are using the tool, and to piggyback or maybe look over their shoulder and learn what they’re doing as well.

And finally, probably one of the biggest ones of them all, why is Honeycomb so great? It’s really our back end. It’s the data store. It’s the ability to have unlimited cardinality on thousands of dimensions, with queries coming back at blazing fast speeds. It’s the ability to not care about the type of data you’re throwing at the system, but being able to see every single one of those individual data points never aggregated for you. It’s a super-efficient data store. Now with that, I’m done talking about Honeycomb. I want to talk a little bit about Codefresh. I’m going to go ahead and let Kostis cover this aspect here, and then we’re going to jump in and show you guys some great stuff.

Kostis Kapelonis:

Yeah, so today in the demo you will see Codefresh and Honeycomb together. Codefresh, if you’re not familiar with it, is a CI/CD solution, but unlike other CI solutions, it’s specifically focused on microservices, Docker containers, and Kubernetes. While most other CI solutions were generic, and they were working for virtual machines, and when Docker Kubernetes appeared they had to adapt. For us things were different. We believed right from the beginning that Docker should be central, that was a solution. This is how we have designed Codefresh.

We also have great integrations for Kubernetes, and at the moment we are one of the few solutions, if not the only one, where you open a Codefresh account and by default, you also get a Helm repository for free, right away. And you get special graphical dashboards where you can look at your Helm releases, your cluster, but for Helm applications. We give you also a Helm pipeline step out of the box right away, that you can use to deploy or to store your charts. So if you’re migrating at the moment to Kubernetes and Helm, Codefresh is one of the best solutions out there.

Another good aspect that I want to discuss, and Pierre you can go to the next slide, is how you can extend Codefresh. So again, if you’re familiar with having CI solutions, if you want to extend the pipeline, you need to write a plugin. And most of the time, you’re forced to learn something new. So maybe you need to be a Java or Groovy developer and learn to create a plugin, or maybe you need to write typescript and write something with this completely specific to the CI solution that you’re working with. It’s not, let’s say, optimal. Maybe you work in a team where everybody’s a Python developer, and you don’t want to learn Groovy.

With Codefresh we have done something different because we believe that Docker knowledge is something that everybody should have in 2020. Extending a Codefresh pipeline is as simple as creating Docker agreements. That’s it. If you know how to create a Docker image, you know how to create a Codefresh plugin, because a Codefresh plugin is a Docker image. What you see in this screenshot is the marketplace of Codefresh. These are some of the plugins that we offer that we believe. They are approved and recommended. But every public or private Docker image can be also plugin in a Codefresh pipeline. This means that all the Docker Hub images that are right now, and there are a lot, can be used in a Codefresh pipeline, and you will actually see this in the demo today.

Pierre has created a plugin for Honeycomb, and the plugin is just a standard Docker image, there’s nothing, no specific to Codefresh about it. He didn’t have to learn any strange Codefresh API, and any strange way to test the plugin. So for us, this is important because it gives freedom for developers to extend their pipelines in the easiest way possible. So learn once how Docker images are created, learn about Docker files, and you are instantly an expert on Codefresh plugins as well. That’s it.

11:14

Pierre Tessier:

Yeah. I want to come back and mention how easy it was to actually build that one step in a pipeline, and we’re going to see that in action as well. And I want to come back and second about the Helm and the Kubernetes support within Codefresh, it is absolutely really easy to use, set up, and putting together what we’re about to show you here in this demo, wasn’t very hard to do using Codefresh.

Starting off with that, it is demo time so let’s go off and put this presentation away for now. And to start off we’re going to show an application called HotROD. You could find it on my GitHub, puckpuck/hotrodrepo. And this repo is… sorry I’m going to go to it right here. So this repo came originally from the examples that are part of something known as Jaeger. For those not familiar with what Jaeger is, it is a distributed tracing open source framework. It comes with STK’s and a bunch of other pieces for you, but it’s an open source way for you to instrument your code. It’s one of a lot of different tools out there.

At Honeycomb we definitely support this way of getting data into the platform through something known as the OpenTelemetry collector. This is a GoLang application. It’s got four services put inside of there, all built into a single go binary. You’re free to download it, there is also a load generator inside of here, and even a Helm chart that we used to push this off that Codefresh actually leverages and takes advantage of. So you’re free to actually play with this.

And what you get when you put this together and run it, is you’ll get something that looks like this. Simple app, it’s called Hot Rides on Demand, or HotRoD. It’s a way where you would click on one of these buttons here, let’s say Trom’s Chocolatier, and it’ll go off and go find that data for you, it’ll go find a ride for you. It’s all very synthetic on what it’s doing behind the scenes. I’m going to go ahead and click this a few more times, just to make sure we actually generate some data for it.

Now at the same time, we are running a load generator on this application as well, so we can generate some data for this demo. And we did something on purpose, we broke the app. So let’s go look at what this app looks like inside of Honeycomb. Here it is right here, I’m going to go ahead and refresh this page to bring you to the front end. It’s getting to the right screen there, and what we have here is the last eight hours of data going against this HotROD application, if you will. We can see a number of requests coming in, it looks like a decent heartbeat kind of style right there. We see a latency graph over here as well. We aren’t generating any errors, that’s good. Nothing coming back to the front end at least, or errors going back on the end user. Might be some stuff internally, but nothing from an overall perspective.

We’re going to go ahead and let’s dig in more on this latency chart because that’s usually where we’re most interested in. We’re going to enter in latency here, and we’re going to look around and just see what is going on. Now I am looking at the last eight hours of data, I’m going to narrow my view down to maybe just the last 30 minutes so we can focus on now. And this is Honeycomb. This allows you to view the data.

And the reason why I’m coming in to focus more on the 30-minute view is because I wanted to bring it down to five-second granularity. Every single one of these points is only five seconds wide. In Honeycomb, we’re always dealing with the raw data. Everything we do is against the raw data. If we go down to 10 minutes, we’ll see in one second why. So each one of those points is one or more pieces of data, and we continue to drill down and query against that raw data in hyper speed.

We were just clicking on it, I think it was Trom, so I’m going to go ahead and add a filter for Trom, just so we can see that data. I think it was a customer, and we’ll stay with Trom’s Chocolatiers, and I’m not even sure how you spell it, so we’re just going to do that right there. And I do that, and there it is. Those are my queries that I was doing against them, taking about between 800 and 900 milliseconds each. That is certainly what we saw coming out on this page right here. I’ve got one there about 1,000, but you get the point.

I could go ahead and click on each and every one of those individual ones, and we’re going to get the distributed tracing view of what just happened. This is really powerful in Honeycomb. It’s not where you’re going in and you’re writing a query, and you’re getting a list of potential traces, with a total round trip time, you’re trying to figure out which one to go in. We allow you to get in there through a graphical view. When you’re looking at a chart, you click on any point in that chart and if there’s a trace behind it, we’ll render that trace for you.

16:01

Inside of this, you get all kinds of good information. This is the classic waterfall chart for distributed tracing. We click on the individual spans here, let’s say we want to look at this SQL statement right here itself. And over on the right-hand side will tell you all kinds of different information about what happened there. First I could see how this span works into the context of the rest, but more so we can continue to look, we can even get down to the actual query that was run and sent from the database. So if this was abnormally long, we could maybe diagnose it from this aspect as well. And there’s more information in here. We can continue to look through this, and learn more from what we want to do.

I’m going to come back to this view. We’re going to actually take off this query right here, and we’re going to look at all that data again. Because there was an interesting pattern with this data. It kind of looks like I’ve got two bands going on, and we’re hearing reports, we have an issue. People are not experiencing that great experience, and although the majority of our traffic, denoted by these darker blue boxes is down here in our respectful time frame, we do have quite a few pieces of data that are spending over two seconds, and even over three seconds in some cases, before they get a response. That’s concerning to us.

Honeycomb, being the analytical engine it is, being able to chew through millions and even billions of records at unlimited cardinality to find information. It’s great but it’s even better when you don’t have to know what you’re looking for. So we’ve got something we know as BubbleUp. What BubbleUp allows me to do is identify visually, the data that I care about. And it’ll go through and tell me what’s different about that data. So I could go ahead and say, “Hey, grab everything here that’s over that two-second window there, and just select it all.” It’ll go through all the data points inside my selection, and compare it to everything it sees, every single column, every single field, every single result, and will try to render what it came back with.

We effectively give you below that, your results in a baseline selection comparison. So my selection is represented by these gold bars, and the baseline is blue bars. Really quickly I see customer dimension, and I’ve got value walking, and my bicycle’s green, each of them representing about 50% of that load that are problematic. And my baseline, they’re really the low percentages and I don’t even see anybody else in there. The same thing for customer ID, which makes sense, and our URLs contain the customer inside of them, so that makes sense as well. So we’re getting somewhere. It’s not the client ID, it’s not the client itself, or the UUID for the individual host. It’s literally the customer.

I didn’t have to go out there and say, “Draw this or group this by customer,” although I’m going to in a couple of seconds. But it is a way for me to understand my data better. Knowing that, knowing I found something inside of Honeycomb, you just click on it. And we’re going to present with a bunch of options. We do this in context, wherever you are in the platform. When you’re looking at your query results, and you’re looking, you see a name, you see a field, you see a dimension, you click on that. Even when you’re in a tracing view, and we’re going to provide you a set of options that you can use to further refine that query you did.

This is something we call a Core Analysis Loop. It allows you to look through your data to continue to answer questions. And you can always back up as well if you get too far in. So I’m going to go ahead and group by customer right here. Now I might do that, and I go to my results tab, down here we see a grouping of every single customer. When I hover over them, we’re going to also highlight the points in the chart above, no matter what kind of visualization you’re using, that represents that group. So clear AC value walking and bicycle greens are up there, and when I get to my other customers, they’re all in the lower band.

I’ve found something. It’s pretty obvious here, it’s something with these two customers that’s going on. We continue to look, what is going on with them? Why are they much higher? And go ahead, just click on any of them, and go back into that trace view. Now when I come to this trace view, I can see that this SQL statement here is taking 2.381 seconds, a pretty long time. So maybe, this to me is interesting. Maybe I want to understand more about the SQL statements. And continuing that journey, we could say, “Hey, why don’t you group on every single SQL statement that you send into the platform, and see which ones take up the most time.”

We could say group by that, let’s change some visuals here a little bit. Let’s go ahead and add a medium on the duration, and we’re going to do it just for the ones that actually have a query inside. We’re going to get rid of that heat map now, we’re just going to go with some lines. But we’re going to do this by query. And really fast, we get a pattern that develops. A lot of things down low, and again those same two customer queries kind of floating up top, causing me a lot of pain. With this, we’ve formed a hypothesis that we have a problem with those two big customers. Certainly enough, I can assure you there is an if statement in the code that is looking for those two customers and doing just that. But I didn’t have to know that beforehand. I just went through and asked questions of my data, and it came back and it answered it for me. No matter where I was in the platform, when I saw results I clicked on it and I continued that journey to further refine what I just did.

21:42

So I can fix this, and we’re going to fix this, but before we do that let’s dive into what Codefresh is going to do when I do that fix, I’m going to commit that fix itself. So this is a Codefresh platform. It’s really handy, I love this platform for helping you build your build pipelines. I’m a Kubernetes junkie myself, I actually run Kubernetes in my home. I love playing with it, so when I came to play with Codefresh and I saw how tight it was integrated with Kubernetes, I pointed to an EKS cluster, and right away it provided me everything I needed to know about that cluster going on. It was very easy to use.

But its Helm support is very fantastic as well. And we can get into this one, but they offer all kinds of different ways for you to integrate with Helm inside of your platform. I’ve been doing a lot of builds working with Codefresh, and it keeps track of every single image I’ve done. But not only every image, but I could get information about every single one of these. You could understand the actual layers that are done to put into that image. No more guessing, did the build work? Did we set this up right, did we build out our pipeline properly? You could really investigate every aspect of what happened with those minors and artifacts you’re actually creating for your app. So it’s really handy in that regard there.

But ultimately, what we’re trying to do with all this is we’re trying to build pipelines. We’re trying to get our app out there and have it working. This right here is the HotROD pipeline that we built. It’s got a handy workflow. And when I built this, when I said down and said okay, we’re going to go build the pipeline for this app, I just came through the marketplace and I just started dragging and dropping things on here. So I had it build a Docker image, and this will build and push the Docker image for me. It provides a couple of different examples for you to use, and whether you’re doing this through Google Build or doing it yourself. Let’s say I’m doing it this way.

Maybe YAML scares you a little bit? Okay, that’s fine, I get that. Go into FormD and just answer all the questions like this. Same thing. Even allows you to offer additional options. The image name, build arguments, tags, all that stuff all built-in, easy to use. And when you’re ready, you hit copy step, and then you’re going to paste it inside of here. It’s just that simple. Then we’re off to the races.

And once this is all built up, and once we have a pipeline given in, we just give it a trigger. In my case, a trigger is a push commit to my repo in GitHub. As soon as I do that, it’ll go ahead and kick off this pipeline, and it’ll do what it needs to do. You have all kinds of different triggers you can use, and you can have more, multiple of them as well.

So now that we know all this, we understand that we can get data into a pipeline, and we can look at seeing how to use it, let’s go actually and go play with that a little bit itself. Over here, I’ve got some code. It’s our actual application itself. I’ve already written the fix, it was a simple configuration change, but I did want to spend some time a little bit, to go over what this app is and how we get all the data in there. So it is using an STK, and that STK emits tracing data, and really all we need to do is get a handle onto the current context and we can append to it.

Distributed tracing STKs, whether you’re using the ones from OpenTelemetry, or Jaeger, they typically come with a lot of auto-instrumentation help for you. So they’re going to automatically wrap all your HTTP calls, your database calls, and those kinds of things. You get that context for free. It’s all brought in, it lives inside your code, and it’ll emit the data as you do your operations in an asynchronous fashion so as to not block what you’re doing. But you can go ahead and augment all that for you. And this is what we’re doing exactly right here. So these are called spans. We’re going to go ahead and get a hand on the first span. Go get that span right away. Once we have that span from the current context, we go ahead and give it a tag. In this case, we gave it a tag called client, and we get another tag down here called customer ID.

Later on, in code, there are other spots where we add the customer name, the URL gets automatically appended to us because we’re wrapping our HTTP calls. SQL statements get automatically added because we’re wrapping our database calls for you. So this is all nice and handy, but with just a couple more lines you could further enrich your data. That’s something we love to talk about at Honeycomb. Enrich that data, send us everything you have. Send us user IDs, customer IDs, and don’t worry about that cardinality. Because you don’t know what you need to know until you need to know it. It’s really important to do that kind of stuff.

26:27

With that, I did write my fix. It’s right here ready to go, so I’m going to go ahead and say, “Fix the silly bug for customers,” and we’re going to commit that. I’m going to go ahead and push this down here to Master. And as soon as my push happens, now that we’re done, we’re going to go back inside of Codefresh, and let’s go look at our pipeline. Literally that fast, I’ve got a new pipeline pending to run. It’s about to kick-off, it’s in progress. So I’m going to go ahead and click on this. What’s happening here is that it’s building out, it’s initializing what we need for our pipeline. I may have… sorry I feel like I’m losing my internet here because this rendering is taking an awfully slow time on my screen. Just going to refresh this real fast.

Kostis Kapelonis:

Maybe you can go to one of the previous builds, so the final-

Pierre Tessier:

Sure.

Kostis Kapelonis:

… is up and then you can wait for it.

Pierre Tessier:

Sure. What happens when we do a build, it’ll go ahead and initialize our container platform within Codefresh itself. This is a container platform because every single step in a build, like Kostis mentioned earlier, is a container. This is a great spot about it. So it’s going to go ahead and do it via clone, it just runs a container that does a clone and throws it into a volume. Then we do a compile step, which goes ahead and builds out all the steps we need to compile our code, and it further goes down the list, and next, we build and we push the image.

And then we get into the deploy stage of our pipeline. What happens here is, the first thing is we run a simple container that contains a small little CLI, and we’re going to get down to what this actual container is in a little bit, that sends something to Honeycomb. It says, “Put a marker on it.” Why do we do that here? Because this is the point right before we’re going to start affecting production. It’s going to draw a marker inside. Next is the Helm deploy. This is where production gets affected. It’s going to do some turns, some containers are going to come online, old containers are going to come offline. It will do a standard Helm upgrade here, so we’re not going to do anything destructive. You’re not going to be without capacity for a while. And then finally, when the deployment in Helm is done and completed, we’re going to come back and we’re going to close that marker in Honeycomb.

This is important, because what happens when you do this is the marker that shows up first is a line, and then when we close it it’ll show up as a window. This allows you to understand if there was any churn that happened. That’s your deployment window, and after you do your deployment you should see the fix take place.

Let’s go back and check up on that pipeline again. Come over here, there it is. It actually did complete, did all the steps for us there while we talked about it. And I’m going to go back to Honeycomb, and we’re going to set this to a… we’re going to go back actually to our home screen over here, and we’re going to set this to a 10-minute view, just so we can really see that window. You can see it happened right there. Now more so is we included a link inside of our deployment marker. So I can go ahead and click on that name, and it’s going to come back to the exact same pipeline in that same view we were just looking at. So you can go back and really understand, was this the right build? You can go look at the actual commit itself, which will link you to the GitHub code. I took care of this, but it all ties it all back together for you.

When we come back to Honeycomb, and I go back to my view over here in that chart, you can see here’s my deployment window, we had a little bit of churn that happened, and after we were finished the deploy, smooth sailing, no more problems. And if we continue to run this to get more live data every time, we re-run it and we can see that we really did fix the problem inside of our application.

30:42

Let’s talk a little bit about what it is that we had to do to make this happen, make this deploy marker happen inside of Codefresh. Inside of our pipeline itself, we’re going to go look at the workflow for it. It was pretty simple. We had an image, right here, that we created. That image included a simple Docker file, we’ll show it to you in a few minutes, and it just included an executable in there called Honey Marker, as well as JQ to parse the output from Honey Marker. We went ahead and ran our command, which is contained right inside of here, which will go ahead and it’ll go and create the marker itself, and it returns an ID. And then we export that ID. So you can see here, this is kind of like you’re running bash script inside of a container, to really help you get what you need.

Later on, after our deployment step, we go ahead and run Honey Marker one more time, but this time here we are going to actually update our marker with an end date. All these parameters are available for you, API key, and Honey data set, they were defined over here inside of my variable pipeline. All the other ones, like the CF Build, are passed down for you. And I create extra variables through the export command.

To tie all this together, we have a Docker image again right here, it’s called Honey Marker. The Docker image itself just looks like this. It’s just a Docker file. Comes from JQ, we add the Honey Marker binary to it and we’re done. And then we go ahead, we Docker push this off, and it’s available, it’s a step inside of Codefresh. And that’s how you build steps in Codefresh.

But the marketplace itself is quite rich for you as well. You could further refine these steps to do multiple things for you. So we can make it all a big huge plugin if we want to as well. I don’t know if you wanted to add anything here as well Kostis about building out steps and making plugins inside of Codefresh to enable this kind of functionality?

Kostis Kapelonis:

Yeah, I want to add, this is a good summary. But this idea, the idea that Docker images are just your steps also means that they are isolated. So in other platforms when you want to install a plugin, you have to install it centrally, and then after you install it, you also need to check if it’s compliant with the other plugins that you already have. And then when another team, let’s say, wants to upgrade the plugin, you need to coordinate between teams and ask everybody, do you want to use the new version, the old version, or how should we do this?

With Codefresh, if all you have is Docker images, this means that every time your pipeline is running, it dynamically runs the Docker images of your plugins, and then as soon as the pipeline has finished, all of the Docker images are discarded. So this means that every pipeline is completely isolated. We created this plugin, and it only exists in our own pipeline. Another team might want to take our plugin, modify it, maybe make some changes that are not compliant with us, and their pipeline will just work. Because when both pipelines run at the same time, each one will deploy its own plugin, there will be no conflicts, no problems with versions and stuff like that.

It’s a very powerful mechanism, it makes creation very easy, and I would say very error-free. You can easily create plugins, and you don’t have the fear anymore of breaking the stuff on other things. You create your own plugin, you use it in your pipeline. If you don’t like it, you just simply discard it. The plugins also in the marketplace are versioned, so you can keep old versions as well. You can have one team use version three of your plugin, and another team version four. But that is not a requirement, you can do whatever you want.

Codefresh essentially solves the plugin help problem that you might have seen in other platforms. Both for developers who are afraid to create plugins with the fear of breaking something, and usually what happens is they ask their operational team, “Hey, please install this plugin for me, and check that it’s compliant with all the other plugins.” With Codefresh you can create a plugin in minutes, and this is the live example of how you can do it.

35:17

Pierre Tessier:

Awesome. And like Kostis is mentioned, it really was simple for us to create steps and to add additional functionality. What that allows you to do is, now part of your CI/CD platform is part of your observability platform. And you could further understand what’s going on when you do those code pushes. You could see them show up inside of your platform, you could see those deployment windows, and it’s not really hard to do. You’re not messing around and encapsulating inside of a plugin, you could repeat this step over and over again, any time you need to.

With that, let me go back now and talk a little bit more about what it is that we’ve learned. One key point that I want everyone to take away from this is observability means being able to answer any question you could have about your application and what is going on inside there. Especially when you don’t know the answer. It is about understanding the unknown unknowns. Everybody knows what CPU load is, everybody understands what network performance might look like. But how is that different for your world, when the requests are coming in and hopping across different services in ways that are not really understood upfront? It’s important to capture all the data you can, not be worried about that cardinality of that data, so you can find out what you don’t even know what it is you’re looking for.

With that, I think we could go ahead and take a couple of questions. Everybody is encouraged to sign up for a free trial. You go to Honeycombio.signup. The same thing with Codefresh, you can sign up for a free account with unlimited builds, or schedule a one on one with any of the experts at Codefresh.io. I do see a couple of questions here in the Q&A.

Kostis Kapelonis:

Yeah, there are some questions, and actually, I think some of them are for both our products. So the first question is, “How do you take care of code security?” It’s not clear if he’s asking about Codefresh or Honeycomb, so maybe we can answer for both. Answering for Codefresh, you can install Codefresh in multiple ways. You can use a cloud version where everything is hosted by us. There’s also an on-premise version, where everything is hosted by the customer. There is also the hybrid version, where the web UI is hosted by us, but the customer is getting the Codefresh run-it, which is a small Kubernetes application, you install it on your cluster and then you run builds locally on your own cluster, not on Codefresh premises.

This is a very popular solution for customers who are anxious about security, when they have very strict legal requirements regarding the code security. If you use the runner, everything stays behind the firewall. The source code stays in your own cluster, and the only thing that is transferred to Codefresh are the builds and all the integrations. But the source code stays there. So if you’re asking about this kind of security, you can do it with Codefresh runner. If you’re asking about code security like security and abilities, Codefresh already has integration with several popular solutions like Aqua Security, Twistlock, Claire. And again, you saw how easy it is to create a plugin, so you can extend your pipeline and have a code security step inside the pipeline where you scan your code, or maybe you’re going to scan your Docker image or both, and then get the results, security issues. And of course, you can stop the pipeline and say something like, “If I have created security issues, do no deploy this image into production.” So if you’re asking about this kind of code security, we have it covered as well. Pierre, regarding code security in Honeycomb?

Pierre Tessier:

Sure, great. That is a really good question. You don’t actually send your code up to us, but you do send data from your code running inside the platform into Honeycomb. Your platform is secure, only your team has access to it. Honeycomb does adhere to SOCS2 Type 2 compliance to make sure our platform is safe in those regards, and we put the proper safeguards in place. However, there are certain times where people say, “My data cannot leave our premises, or we cannot risk anything.” For that, Honeycomb actually has a great solution for you known as Secure Tenancy. In this case here, you run a proxy internally, which will encrypt every single piece of data. You’ll have that encryption key, Honeycomb does not get it.

The data gets encrypted, it gets sent up to Honeycomb, and it’s stored in an encrypted form. We can’t read any of the strings, we don’t understand the strings, and the data is all encrypted for us. When you query that data and you pull it back down into your platform, it’ll run right back through that same secure tenancy proxy, where it gets decrypted before rendered into your browser. So it’s a great technology, Honeycomb’s got a patent on it, and we’re really happy to share this with our customers. It helps us also achieve HIPAA compliance for those who have very stringent needs and security requirements for their applications.

40:44

Kostis Kapelonis:

Another question that I think is for Honeycomb is, “Can you talk more about the STK that you use to instrument your code?”

Pierre Tessier:

Yeah. In my case here, I use the Jaeger STK wrapper for open tracing. Wow, say that mouthful. I just wanted to use a lot of open source solutions to instrument my code. But you’re not limited to that. Honeycomb actually offers a great line of STKs we call Beelines, which automatically instrument everything for you, and are really, really easy to use. Honeycomb is also part of the OpenTelemetry Consortium, which is part of CNCF, a very popular project out there. And this is a way to standardize on instrumentation across the board for all platforms, for all vendors, for everybody. Honeycomb, we are part of that, we do support it, our STKs work with it. And in fact, the data that you see, the entire demo, in order for you to stand it up you actually need to stand up an OpenTelemetry collector. Again, just trying to use more open-source parts, and really showing you can collect all these things together and we’re compatible with all of them.

There are a few different options there, and to list them again you have the OpenTelemetry STKs, which is a great way to go, they are vendor neutral. Honeycomb does offer its own STKs, known as Beelines, a very powerful, auto-instrumentation, world-class set of STKs that are easy to use. And you could also use other open source STKs. In my case, I used Jaeger with open tracing, and I piped that through an OpenTelemetry frame or pipeline.

Kostis Kapelonis:

So two questions that are for Codefresh are, “How do you integrate with AWS EKS?” And I think here you are using an EKS cluster, or not?

Pierre Tessier:

I am.

Kostis Kapelonis:

Maybe you can show the screen, so we can answer the question live. In general-

Pierre Tessier:

Sure.

Kostis Kapelonis:

… Codefresh works with any compliant Kubernetes cluster. So if your cluster supports the Kubernetes API, it should work. You can actually run Codefresh on your own laptop. But as far as integration is concerned, if you go to account settings where people can see the settings, and then if you go to Kubernetes. Here is how you integrate clusters in Codefresh. You add them once, centrally, in your own account, so the credentials are stored here. And then each cluster gets a unique name. Then when you create the pipeline, you don’t have to care about credentials anymore, you just recognize clusters by name. So this screen is normally only accessible to the administrator in your team, and then all developers can quickly create pipelines if they just know the name of the cluster. This is the same way for all integrations. If you have a Docker registry, you define the Docker registry once here centrally, and then inside the pipeline, you mention the Docker registry by name.

A related question to this is, “How do you support multiple registry deployment at the same time?” This has many answers. You can simply create one pipeline that is using all your clusters in different versions. You can have different pipelines running an iteration, or you can have a single pipeline where we can support parallel steps. So you can run some steps in parallel in your own pipeline. You can say I want to deploy to Europe and the U.S. and Asia at the same time, and every pipeline in Codefresh, by default, gets access to all Kubernetes cluster by name. So as long as you have connected your clusters to the different regions inside Codefresh, you can use them right away.

Pierre Tessier:

I’m seeing another question here about piggy-backing, about access to data for specific users. In Honeycomb, you can segregate your data by teams, we do allow you to do that. A user belongs to one or more teams, and that’s a great way to segregate your data. Without understanding all your needs a little bit deeper, but we do offer different ways to segregate data. There are data sets inside of teams as well, that’s another level of abstraction.

45:06

Kostis Kapelonis:

Then another question for Codefresh is, “What are the costs of creating all these containers?” Pipelines run in a dynamic manner, if you don’t run anything no resources are consumed. Once you create the pipeline, all the Docker containers that are needed for this pipeline are launched. You can also create additional services inside the pipelines. Maybe you want to have some integration tests, and you want a MySQL database. Install at the same time, you can do this as well.

If you use the cloud version of Codefresh, these resources come out of your available machines. So when you create a Codefresh account, you can choose what capacity you want to use. In the free version you get, I think, one small machine, and then as you pay you can add more machines. So it’s only by what you pay in the cloud version. If you use the Codefresh runner, where you’re running pipelines to your own premises, it’s also counting by you, by the customer. If you install the runner in a powerful cluster, you can use all the resources of that cluster and you can split your resources into different clusters. You can even connect multiple clusters at the same time. The runner is just a standard Kubernetes application, so all the tools and the knowledge you have to monitor existing Kubernetes resources and existing applications, it’s exactly the same way with Codefresh resources. If you have a good way to monitor the resources that pipelines are consuming in your cluster, this is how you do it.

Then there is another question, “How easy is it to convert existing pipelines from Jenkins to Codefresh? Jenkins provides plugins, does Codefresh provide plugin support?” We know that a lot of customers are using Jenkins. We actually have a migration guide, if you do a Google search. Pierre, migrate from Codefresh Jenkins integration, you will see the guide. It’s a bit difficult to talk about all the specifications because there are customers that are using Jenkins One… yeah, that’s the second link… There are customers that are using Jenkins Two, so it depends on what the customer is doing. And then also the plugins in Jenkins are different between versions. Jenkins One is using java-based plugins, Jenkins Two is using CIRT libraries.

So this migration guide gives you some hints and some guidance on how you can do it, and it covers the most usual cases where you have a pipeline that is checking out the code, creating a Docker image, and deploying to Kubernetes. So if you have this simple scenario it’s very easy to convert. For most complex cases, yeah you can ask us how you do it. But the idea here is that in Jenkins usually, you have a very complicated pipeline, because, as I said, the pipeline itself is handling all the credentials. Using the pipeline in Jenkins, you manually run commands like Docker login, Docker push, Docker tug, while in Codefresh, as you saw, this is abstracted. The build step is very simple, you just connect your Docker registry once centrally, and then the build step is trivial, you say I want this Docker file to be built, and I want to push it to this registry. Usually what happens is if you migrate your pipelines it becomes much simpler and much more easy to understand and read.

Regarding plugins, again Jenkins has a very huge library of plugins, so for some things that you might use a Kubernetes deployment plugin, you don’t need it in Codefresh because Codefresh can deploy the Kubernetes natively. If you have another plugin, your best bet is to search DockerHub and see if there is a Docker image that is having the same or similar functionality, and you can use it in a Codefresh pipeline as you saw. If you have a completely custom plugin, java code, then you need to either re-package it on a Docker image or make some modifications to make it easy to be packaged in a Docker image and using a Codefresh pipeline. Codefresh works only with Docker images, that’s the main idea. If you have a Docker image you can use it in Codefresh, and if you package it in a Docker image you can use it in Codefresh.

There is another question for Codefresh, “How do you restrict access to clusters in deploys? For example, I want an application deployed to a used cluster, but I want to extend the deploy to a new cluster.” In the enterprise version of Codefresh, you also get a feature that we call Teams and Permissions, which works with tags. Inside Codefresh you’re tagging your pipelines, your projects, your cluster with tags. So you can tag, this is my production cluster, this is my staging cluster, this is my UA cluster, and the same thing for pipelines and for projects. And then there is a powerful UI where an administrator can give specific rules and say, “I want members of the developer team to not have access to production clusters, but to have access to staging clusters. And I want members of my DevOps team to have access to production clusters,” and stuff like this.

This not only works with clusters but also it works with pipelines, so you can restrict pipeline usage as well. You can even decide who can debug a pipeline, who can approve a pipeline, and it’s very flexible on what you do. But I want to be clear that this feature is only available to Enterprise at the moment, so if you’re using the free account, you will not see this UI, you can’t do it.

50:58

Kostis Kapelonis:

Also, I think it’s important to mention that the integration screen that Pierre was showing, apart from native integration, you can also have a custom cluster. So even if you know there is a cluster that you host and it’s complying with the Kubernetes API, you can provide us with a service account. If you click on provide it, it’s the custom provider. And the fifth one, yep that one. So here you give us a service account. So Codefresh has the exact same capabilities as the service account. If you give a service account that is available only to specific namespaces, then Codefresh can only work with those specific namespaces. And we give you also a guide on what are the ideal permissions that we need for the cluster, so if you want to deploy with Codefresh, you need some additional permission. 

But of course, let’s say somebody wants to have a read-only view in Codefresh for some reason, you can do this as well. So you can give Codefresh only capabilities for read-only access to your namespaces, or to specific namespaces. Here we essentially use the native Kubernetes capabilities regarding security. So again, all your knowledge regarding Kubernetes security directly maps to Codefresh security. I think that’s if for questions. No, we have a lot more. Wow.

Pierre Tessier:

Yeah.

Kostis Kapelonis:

Okay, we can keep answering for five minutes or not?

Taryn Jones:

I’m going to go ahead and launch our poll real quick, so thank you, everyone, for answering our poll questions, but we’ll keep rolling with the questions if you guys are open to it, Pierre and Kostis. We have about six more minutes on our schedule.

Pierre Tessier:

I think there was one here about… oh boy, I’ll let you capture them. I saw one here about caching images, yes there is an option to do that in Codefresh.

Kostis Kapelonis:

Yeah, for caching images we are one of the few solutions we offer a distributed Docker layer cache. Usually what happens with other CI solutions, if you have let’s say five big nodes and you build a Docker image in the first node, and then the next time you run the pipeline for some reason the third node is picking the build. This node doesn’t have anything from the Docker layers, so your build will take the same time as the first build. In Codefresh, we have a distributed layer cache where all the nodes are equal, all of them have the same access to the Docker cache, so it doesn’t really matter which node we pick. Your build, you don’t need to know, to care about this. So the second time that you run a build, all the Docker layers that are the same, and we follow the same rules as the standard Docker build is following, will be cached.

So if you upgraded a proper Docker file where on the top you have all the directives of don’t change, stuff like apply update or download, and then at the bottom, you have the source code copy, every time you copy the source code or make a commit, only that particular Docker layer will change. We also have an even smarter Docker caching mechanism where Codefresh will check the hash ID of a Docker image, and if it finds that this exact image has been created before, and it’s already in the registry because, remember, we have native Docker registry integration, then this build will be skipped completely. You will not even build the image, because we know it has been built again.

Which is a very good practice anyway. One of the good Docker practices is that you create your image once, and then you deploy this image to different environments, and you promote it from one environment to the next, instead of build. Because we believe that this scenario is the best practice, we have taken a lot of measures to try and help people with Docker caching. And if you compare other solutions, there are other solutions where Docker caching is either something extra that you have to pay, it’s something that is not supported, or you have to really write a very strange configuration in order to gain this. With Codefresh, you get it out of the box.

There is another question, “If you have OpenShift integration…” Yes, you can connect your OpenShift cluster right away. OpenShift is a compliant Kubernetes cluster so you can use the custom provider. Maybe in the future, we’ll have something more user friendly, but you can do it, it will work right away. And we have customers that are running Codefresh with OpenShift.

55:40

Pierre Tessier:

I did want to put up, if anybody wanted to download the slides for this session, you could also find them at this link. It’s my speaker page on Honeycomb, you can hold your phone up there and scan that, it should work, it will get you right there. We are coming up to the top of the hour. Thank you everybody for joining.

Taryn Jones:

Yes, and thank you Pierre, and thank you so much Kostis. Thank you all, everyone, who attended today for all of your great questions. As a reminder, I’ll also send you the slides via email, and then I’ve also added the link down there for Pierre’s speaker page in our chat. And I will send you a copy of the recording tomorrow, so look out for that. From me, Taryn at Codefresh.io, again thank you so much for putting this presentation, and Kostis, you all have a great day, and thank you for joining us on Codefresh Live.

Pierre Tessier:

Thanks, everybody.

Taryn Jones:

Bye.

If you see any typos in this text or have any questions, reach out to marketing@honeycomb.io.

Transcript