In this course you will learn how to monitor and observe your cloud and on-premises infrastructure, how well-architected principals inform your decisions around what to instrument, and how this instrumentation helps you meet your business objectives.
Course objectives
In this course, you will learn to:
- Identify key performance metrics from your existing workloads
- Understand why observability is important for modern operations
- Measure Mean Time to Resolve (MTTR) and Mean Time to Identification (MTTI), and understand how they relate to operational uptime
- Navigate services AWS offers that can help reduce MTTI
- An understanding of logs, metrics, and traces, and how they compose the foundation of cloud-native observability tools
Intended audience
This course is intended for:
- DevOps
- Developers
- Site reliability engineers
- IT managers
Prerequisites
- None
Course Outline
- Well-architected principals
- What is observability, and how it differs from monitoring
- Foundational datatypes for observability
- Mean Time to Detection (MTTI), and typical troubleshooting workflows
- Maturity of observability as an evolution of monitoring
- Synthetic Canaries
- CloudWatch alarms, metrics, logs, and dashboards
- AWS X-ray and application tracing
- Next steps