This conference talk from SREcon25 Americas explores how a financial services provider developed a comprehensive, automated chaos engineering program with datacenter-level stress testing. Join Clayton Krueger from USAA as he shares their journey of elevating chaos testing beyond individual applications to entire data centers. Discover the key stages of their progress, how they overcame challenges related to fear, uncertainty, and doubt, and the tools and strategies that proved effective in their complex environment. Learn valuable insights for implementing large-scale chaos engineering and get a glimpse of their future plans for enhancing infrastructure robustness. Perfect for SRE professionals looking to understand how to apply chaos engineering principles at scale in enterprise environments.
Overview
Syllabus
SREcon25 Americas - Chaos Experiments - Datacenter Stress Testing
Taught by
USENIX