FENDER MUSICAL INSTRUMENTS CORPORATION:
Since 1946, Fender has revolutionized music and culture as one of the world’s leading musical instrument manufacturers, marketers and distributors. Fender Musical Instruments Corporation (FMIC), whose portfolio of brands includes Fender®, Squier®, Gretsch® guitars, Jackson®, EVH® and Charvel®, follows a player-centric approach to crafting the highest quality instruments and musical solutions across genres. FMIC is dedicated to unlocking the power of music through electric and acoustic guitars, amplifiers, pro audio, accessories and digital products that inspire and enable musical expression at every stage, from beginners to history-making legends.
- AWS Lambda microservices written in Go
- AWS Lambda sending Cloudwatch logs to Honeycomb
Why would you enter into an online space as a guitar company? It’s to know more about players: understand who they are, talk to them, and guide them along the way.
Fender’s customer focus makes observability a key requirement: to understand and support your users, you have to have visibility into their experience of your service.
The goal of the platform team is to create and support services that support our web and mobile applications and provide data and analytics to the business. In that way we are able to understand and support players.
Fender has seen almost a 10X increase in traffic since the stay-at-home policies have been put in place during the CoVID pandemic. Fender’s serverless infrastructure has scaled up well but there was some API latency which appeared on their SLO dashboard and was showing burn-down.
After further investigation, we determined that we needed to beef up our Elasticsearch cluster and after that was done the latency SLO started recovering. The best part is we only had slightly degraded API performance – with the SLO we were able to be alerted to what was happening before it resulted in a poor user experience
What They Needed
- A powerful and intuitive interface for debugging and troubleshooting problems
- Fast search results across high-cardinality fields such as unique customer IDs
- SLOs to alert on the most important indicators to prioritize workload and improve user experience.
Honeycomb @ Fender
The Fender Platform team maintains their own ELK environment to aggregate logs and investigate problems, but have found the platform slow and difficult for team members to learn to use.
We have an ELK stack, but the query language is cumbersome and the UI is not as easy to use as Honeycomb.
They installed Honeycomb specifically to get the benefit of searching across all their Cloudwatch logs from Lambda at once, and have been especially pleased with the speed at which they can zero in on an issue.
Honeycomb and Lambda work very well together. Without Honeycomb, it was very difficult to get all the Cloudwatch logs correlated, but now we can just do a quick search by request or userID with no difficulties.
Recently, the Fender Platform team deployed an update to the subscription management service for Fender Play users. Soon afterward, they noticed errors related to the billing systems, which started to spike over the course of the morning.
Using Honeycomb, they determined that the issue was related to differences between their test and production environments and could confirm, thanks to the Honeycomb Marker they’d set, that this problem was definitely caused by their recent deployment. They were able to roll back to a stable environment within five minutes.
This graph shows the Lambda function status codes during the timeframe of the deploy. The spike in 500s occurred after the first Marker, then dropped after the second, which was the rollback. Subsequent Markers show that a third deploy still had the issue, but it was fixed in the 4th.
Even more critically, they were able to drill into their data and identify the exact users impacted by the issue, allowing their Customer Support team to contact those users before they contacted Customer Support to report an issue.
And all of this would have taken significantly longer using their existing ELK setup.
If we did not have Honeycomb it may have taken an additional 30 minutes or so to determine what the issue was by poking around in ELK or Cloudwatch logs. Honeycomb’s visualization and honeymarkers made it obvious that the issue was related to a recent release. On top of that, Honeycomb allowed us to determine the affected users and pass that information on to our support team.
The team finds it easier and faster to get results with Honeycomb.
Now that we have Honeycomb, we just use the ELK stack as the archive.
January 20, 2020
SLO Theory: Why the Business Needs SLOs
Now engineering and business speak the same language. Find out why you should care, how SLOs are critical to SRE practice, & how to keep your customers happy.
June 24, 2020
Raw & Real Ep. 4 Build Better Builds
In this episode, we cover watching build pipelines with traces, using BubbleUp to find performance impacts/identify optimizations, and using markers with queries to verify deployment behavior.