Thursday, June 02, 2011

Post-Mortems at HubSpot

What I Learned From 250 Whys by Dan Milstein at HubSpot. This is quite good and worth reading!

I've learned about how HubSpot's systems work, why they sometimes break, and what we can do to make them more resilient. Beyond that, I've learned a lot about complex systems and failure in general. Which, in case you're wondering, is a fascinating topic. I highly, highly recommend Richard Cook's essay “How Complex Systems Fail” in O'Reilly's Web Operations. Or Atul Gawande's Complications and Better. Or basically anything John Allspaw writes.

If you'd like to build resilient systems, here's some of what I've learned from the fifty-plus 5 Whys I’ve been a part of. (And by &rlquo;systems,” I mean systems of people + machines, and by “resilient,” I mean I'm stealing from Allspaw.)