In today's fast-paced digital landscape, system resilience is vital for businesses of all sizes. "Chaos Engineering" is a comprehensive and hands-on course designed to equip you with the knowledge and skills needed to ensure your systems withstand and recover from failures. From foundational concepts to advanced applications on various AWS services, including EC2, Aurora, Fargate, and EKS, as well as strategies to ensure availability across multiple Availability Zones.
What You’ll Learn:
Chaos Engineering Fundamentals:
Understand core principles and the philosophy behind Chaos Engineering.
Learn why identifying and addressing system weaknesses through controlled chaos experiments is vital.
Explore essential tools and methodologies for implementing Chaos Engineering.
Building a Basic Fault Injection Simulation (FIS) Experiment:
Gain a step-by-step understanding of constructing and executing your first Fault Injection Simulation (FIS) experiment.
Understand how to design experiments targeting different failure modes in a controlled setting.
Learn to interpret experiment results and refine your simulations for better accuracy.
Introduction to Real-Life Application:
Discover how to apply Chaos Engineering experiments to real-world applications.
Learn best practices for monitoring, capturing metrics, and analyzing results to continually improve system resilience.
Chaos Engineering on Compute - EC2:
Conduct chaos experiments on EC2 instances to evaluate and improve system robustness.
Simulate failures, such as instance termination or network latency, and observe impacts.
Chaos Engineering on Database - Aurora:
Learn to apply Chaos Engineering principles to Amazon Aurora databases.
Simulate failures like cluster instability or node outages and develop strategies for seamless recovery.
Chaos Engineering on Serverless - Fargate:
Conduct chaos experiments on AWS Fargate to test the resilience of your serverless applications.
Simulate events like task failures or service downtime to ensure robust serverless architectures.
Chaos Engineering on Kubernetes - EKS:
Implement Chaos Engineering on Amazon EKS to stress-test Kubernetes clusters.
Simulate pod failures, node crashes, and other disruptions to validate recovery mechanisms.
Chaos Engineering on Availability Zone:
Conduct chaos experiments across different AWS Availability Zones.
Test the impact of zone failures and ensure your systems are prepared for multi-availability zone disasters.
Target Audience:
- Developers interested in enhancing their systems’ resilience.
- Site Reliability Engineers (SREs) focused on improving system reliability.
- Cloud Engineers managing AWS environments.
- Technical Support Engineers specializing in fault-tolerant systems.
- Technical Leads overseeing cloud-native application projects.
This course, with its combination of theory, demonstrations, and real-world scenarios, will enable you to build resilient systems capable of withstanding and recovering from unexpected failures efficiently. Join us to master Chaos Engineering and innovate with confidence.
Overview
Syllabus
- Chaos Engineering Fundamentals
- The Chaos Engineering Fundamentals module introduces learners to the concept and importance of chaos engineering for building resilient systems. This module covers the basics of chaos engineering, an overview of AWS Fault Injection Simulator (FIS), and examples of experiments. It concludes with a quiz to reinforce the key concepts.
- Building a basic FIS experiment
- The Building a Basic FIS Experiment module guides learners through the step-by-step process of setting up and executing a basic Fault Injection Simulator (FIS) experiment. This module covers creating permissions, building an Auto Scaling Group (ASG) architecture, running experiments, and using monitoring tools like CloudWatch. Learners will also see practical demonstrations for better understanding and complete a quiz to solidify their knowledge.
- Introduction to Real life Application
- The Introduction to Real Life Application module provides an overview of deploying and setting up a real-world application to use in chaos engineering experiments. Learners will understand the prerequisites, set up architecture, deploy the application, and establish steady-state metrics using CloudWatch RUM and X-Ray. Planning effective experiments and deploying using CloudFormation will also be demonstrated, followed by a quiz to reinforce key concepts.
- Chaos Engineering on Compute - EC2
- The Chaos Engineering on Compute - EC2 module focuses on executing chaos engineering experiments on EC2 instances, specifically simulating disk fill scenarios. Learners will gain practical experience by observing system behavior and metrics before and after running FIS experiments, using tools like X-Ray for monitoring. The module concludes with a quiz to test understanding of key concepts.
- Chaos Engineering on Database - Aurora
- The Chaos Engineering on Database - Aurora module explores conducting chaos engineering experiments on Amazon Aurora databases, focusing on a reader node reboot scenario. Learners will be guided through setting up prerequisites, creating necessary IAM roles, and executing FIS experiments. The module includes demonstrations on monitoring database state and metrics post-experiment and ends with a quiz to reinforce learning.
- Chaos Engineering on Serverless - Fargate
- The Chaos Engineering on Serverless - Fargate module delves into applying chaos engineering techniques to serverless architectures, with a focus on Amazon ECS Fargate. Learners will explore experiment design and hypothesis formation, setting up steady-state conditions, and executing an I/O stress test. The module will feature demos for IAM role creation and post-experiment analysis, concluding with a quiz to reinforce understanding.
- Chaos Engineering on Kubernetes- EKS
- The Chaos Engineering on Kubernetes - EKS module covers the application of chaos engineering principles to Amazon EKS (Elastic Kubernetes Service). Learners will gain insights into running memory stress and pod deletion experiments to test and improve the resilience of Kubernetes clusters. The module includes detailed demonstrations of experiment execution and post-experiment analysis, followed by a quiz to assess knowledge retention.
- Chaos Engineering on Availability Zone
- The Chaos Engineering on Availability Zone module focuses on conducting chaos experiments to test the resilience of applications across different Availability Zones (AZs). Learners will understand the significance of AZs and follow guided demonstrations on setting up, preparing, and executing experiments to evaluate system behavior under stress. The module concludes with a quiz to consolidate learning.
- Conclusion
- The Conclusion module provides a final review of the chaos engineering practices covered throughout the course and emphasizes the importance of a thorough cleanup process.
Taught by
Nasia Ullas