This course aims to teach learners how to achieve performance predictability in model serving systems for Deep Neural Networks (DNNs). The course covers the following learning outcomes and goals: understanding the importance of low latency in model serving systems, recognizing the predictability of execution times in DNN inferences, learning a principled design methodology for building a distributed model serving system, and achieving predictable end-to-end performance. The individual skills or tools taught include designing and implementing a system like Clockwork to support multiple models while meeting latency targets and request-level service-level objectives. The teaching method involves presenting concepts through a structured syllabus including topics like High Tail Latencies, Predictable Worker, and Clockwork. The intended audience for this course includes professionals working with machine learning inference, model serving systems, or interested in improving performance predictability in DNN applications.
Overview
Syllabus
Introduction
High Tail Latencies
Predictable Worker
Clockwork
Clockwork Example
Conclusion
Taught by
USENIX