In today's fast-paced digital landscape, system resilience is vital for businesses of all sizes. "Chaos Engineering" is a comprehensive and hands-on course designed to equip you with the knowledge and skills needed to ensure your systems withstand and recover from failures. From foundational concepts to advanced application on various AWS services including EC2, Aurora, Fargate, and EKS, as well as strategies to ensure availability across multiple Availability Zones.

What You’ll Learn:

**Chaos Engineering Fundamentals:**

*   Understand core principles and the philosophy behind Chaos Engineering.
*   Learn why identifying and addressing system weaknesses through controlled chaos experiments is vital.
*   Explore essential tools and methodologies for implementing Chaos Engineering.

**Building a Basic Fault Injection Simulation (FIS) Experiment:**

*   Gain a step-by-step understanding of constructing and executing your first Fault Injection Simulation (FIS) experiment.
*   Understand how to design experiments targeting different failure modes in a controlled setting.
*   Learn to interpret experiment results and refine your simulations for better accuracy.

**Introduction to Real-Life Application**:

*   Discover how to apply Chaos Engineering experiments to real-world applications.
*   Learn best practices for monitoring, capturing metrics, and analyzing results to continually improve system resilience.

**Chaos Engineering on Compute - EC2:**

*   Conduct chaos experiments on EC2 instances to evaluate and improve system robustness.
*   Simulate failures, such as instance termination or network latency, and observe impacts.

**Chaos Engineering on Database - Aurora:**

*   Learn to apply Chaos Engineering principles to Amazon Aurora databases.
*   Simulate failures like cluster instability or node outages and develop strategies for seamless recovery.

**Chaos Engineering on Serverless - Fargate:**

*   Conduct chaos experiments on AWS Fargate to test the resilience of your serverless applications.
*   Simulate events like task failures or service downtime to ensure robust serverless architectures.

**Chaos Engineering on Kubernetes - EKS:**

*   Implement Chaos Engineering on Amazon EKS to stress-test Kubernetes clusters.
*   Simulate pod failures, node crashes, and other disruptions to validate recovery mechanisms.

**Chaos Engineering on Availability Zone:**

*   Conduct chaos experiments across different AWS Availability Zones.
*   Test the impact of zone failures and ensure your systems are prepared for multi-availability zone disasters.

**Target Audience:**

*   Developers interested in enhancing their systems’ resilience.
*   Site Reliability Engineers (SREs) focused on improving system reliability.
*   Cloud Engineers managing AWS environments.
*   Technical Support Engineers specializing in fault-tolerant systems.
*   Technical Leads overseeing cloud-native application projects.

This course, with its combination of theory, demonstrations, and real-world scenarios, will enable you to build resilient systems capable of withstanding and recovering from unexpected failures efficiently. Join us to master Chaos Engineering and innovate with confidence.

Introduction

Chaos Engineering Fundamentals

Building a Basic FIS experiment

Introduction to Real life Application

Chaos Engineering on Compute - EC2

Chaos Engineering on Database - Aurora

Chaos Engineering on Serverless - Fargate

Chaos Engineering on Kubernetes- EKS

Chaos Engineering on Availability Zone

Conclusion

Chaos Engineering

Nasia is an Engineering Development and Integration Subject Matter Expert (SME) in Disaster Recovery (DR), Hybrid Cloud, Resilience and Business Continuity. Her expertise spans Cloud Computing, Disaster Recovery as a service, Infrastructure as Code, Data Replication and Archiving.She excels in testing and integrating new technologies into existing infrastructures and develops robust technical solutions for Migration, Disaster Recovery, and Archiving tailored to meet business requirements. Her deep knowledge of multiple proprietary DR technologies is evident in her comprehensive Disaster Recovery testing, planning, and implementation efforts. Ensuring application resilience and business continuity is a primary focus for Nasia, driving her daily efforts and strategic initiatives.

Nasia Ullas