This course aims to teach learners about the industry's best practices for alerting in complex distributed systems. The goal is to reduce alert fatigue by focusing on symptoms associated with end-user pain. Students will learn about Adaptive Paging, an alert handler that uses tracing causality to page the team closest to the problem. The teaching method includes real-world examples of alert handling and outage scenarios. This course is intended for professionals working with distributed systems, particularly those facing challenges with alert management and observability.
Overview
Syllabus
Introduction
Monoliths
Ops dev silos
New roles
The solution
Adaptive Paging
Alert Handler Example
Outage Example
Challenges
Observability
Questions
Network Partial Outage
Taught by
USENIX