The Failover That Failed Successfully - Lessons from a Successfully Failed Disaster Recovery and Failover Test
Failed to add items
Add to basket failed.
Add to wishlist failed.
Remove from wishlist failed.
Adding to library failed
Follow podcast failed
Unfollow podcast failed
-
Narrated by:
-
By:
About this listen
Conducted during a busy release weekend, the failover test exposed gaps not in the technology itself, but in coordination and communication. While production ultimately stayed unaffected, the situation quickly escalated as subcontractors weren't aligned, assumptions didn't match reality, and information didn't flow when it mattered most.
We unpack how a well-intentioned test turned into a coordination challenge, where timing, dependencies, and unclear responsibilities created confusion across teams. It's a story about how resilience isn't just about systems and infrastructure, but also about people, processes, and making sure everyone is on the same page — especially when things are supposed to "just be a test."
00:00 Welcome & Setup
01:34 Corporate Environments
03:30 Failover Planning
07:19 Double Disaster
09:08 Critical Failure
13:20 Realization Moment
15:28 Split Brain
17:34 The Recovery
21:13 Lessons Learned
31:32 Conclusion