Honeycomb Blog

RDS Performance Degradation – Postmortem

Summary Note: all times are UTC unless otherwise noted. On Thursday, May 3 starting at 00:39:08 UTC (Wednesday 17:39 PDT) we experienced a nearly complete outage of the Honeycomb service lasting for approximately 24 minutes. Most services came back online at 2018-05-03 01:02:49 and all customer-facing services were fully restored by 01:07:00. Impact During the outage, customers were not able to log in or view graphs on the Honeycomb service. Events sent to the Honeycomb API during the outage were refused at elevated rates; approximately 86% of API submissions during the outage were not accepted, and approximately 81% of events…

Read More...

Security Through Observability

Observability is great for understanding the ramifications of your system. In brief, massively distributed application stacks demand more sophisticated tools than traditional metrics/monitoring, because engineers must be able to ask new questions and get new answers when the system surprises them. User happiness is the ultimate consequence of the systems we build, and as Charity is fond of saying (and as people are fond of quoting her): "Nines don't matter if users aren't happy." Put this on your whiteboard. https://t.co/eMiyKxPEoZ — John Arundel (@bitfield) October 19, 2017 Now, what happens when those ramifications potentially include the actions of malicious outsiders?…

Read More...