
High Performance Unstructured SpMM Computation Using Tensor Cores
Scalable Parallel Computing Lab, SPCL @ ETH Zurich via YouTube
Overview

FLASH SALE: Ends May 22!
Udemy online courses up to 85% off.
Get Deal
This conference talk presents "High Performance Unstructured SpMM Computation Using Tensor Cores," showcasing innovative research on sparse matrix-matrix multiplication optimization. Discover how the SMaT (Sparse Matrix Matrix Tensor Core-accelerated) library enables efficient utilization of Tensor Cores for unstructured sparse matrices, overcoming hardware limitations that typically constrain sparse computations. The presentation explores how the library leverages the low-level CUDA MMA API to maximize GPU performance, with algorithmic optimizations like sparse matrix permutation that minimize non-zero blocks. Evaluation results demonstrate SMaT outperforming state-of-the-art libraries by up to 125x (2.6x on average) on NVIDIA A100 GPUs. The talk covers the introduction to the problem, details of the SMaT approach, performance modeling methodology, comprehensive evaluation results, and concludes with implications for scientific computing, large-model training, and inference applications. The 31-minute presentation from ETH Zurich's Scalable Parallel Computing Lab was delivered at SC '24, the International Conference for High Performance Computing, Networking, Storage, and Analysis.
Syllabus
00:00 Introduction
03:45 SMaT: Sparse Matrix Matrix Tensor Core-accelerate
06:10 Performance Model
09:35 Evaluation
21:30 Conclusion
Taught by
Scalable Parallel Computing Lab, SPCL @ ETH Zurich