This course explores the intersection of Site Reliability Engineering (SRE) and Machine Learning (ML), emphasizing why it is essential for SREs to understand ML technologies. The course covers topics such as managing ML in production, the challenges of ML reliability, and the differences between ML and AI. The teaching method includes a review of the current state of using ML for automation with a critical perspective. The course is designed for SRE professionals looking to enhance their knowledge and skills in ML within the context of distributed computing.
Overview
Syllabus
Introduction
Lambda
Dave
TMU
ML does matter
Pause and breathe
Managing ML in production
How hard is ML
Hype and Reality
Gartner Hype Cycle
ML vs AI
ML Ops
Model Quality
ML is data sensitive
I can ML
The future
Future predictions
Future reading
Taught by
USENIX