Overview
This course aims to teach learners about deep-learning model sparsity through a new abstraction called Tensor-with-Sparsity-Attribute (TeSA). The goal is to enable the specification, propagation, and utilization of sparsity attributes and patterns across the entire deep learning model to create highly efficient operators. By using the SparTA framework, learners will be able to achieve significant speedups on inference latency compared to existing sparse solutions. The course covers topics such as computation capacity vs. DNN model size, evolving sparsity patterns, SparTA system architecture, execution transformation, code specialization, and evaluation on various patterns and models. The intended audience for this course includes individuals interested in deep learning, model optimization, and efficiency in neural networks.
Syllabus
Intro
Computation Capacity vs DNN Model Size
Sparsity Commonly Exists
Evolving of Sparsity Pattern
Obstacles of Sparsity Optimization
The Myth of Proxy Metrics
Across-Stack Innovations in Silos
SparTA: An End-to-End Approach to Model Sparsity
Core Abstraction: TeSA
System Architecture
Execution Transformation
Code Specialization
What SparTA Achieves
Evaluation on Various Patterns & Models
End-to-end Opportunity
Mixed Sparsity Evaluation
Real Latency for Algorithm
Conclusion
Taught by
USENIX