Explore strategies to optimize GPU efficiency in data centers through this informative conference talk. Discover how to improve Model Flops Utilization (MFU) of AI accelerators by examining real-world production practices. Learn about training Large Language Models (LLMs) with billions of parameters on large-scale Kubernetes clusters, covering techniques such as model parallelism, switch-affinity scheduling, and checkpoint optimization. Gain insights into enhancing GPU utilization through GPU sharing technology, implementing training-inference hybrid solutions for tidal scenarios, and improving efficiency through node grouping and application matching. Understand the challenges of GPU monopolization by underutilized applications and explore methods to ensure AI devices work efficiently around the clock.
Is Your GPU Really Working Efficiently in the Data Center? N Ways to Improve GPU Usage
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Syllabus
Is Your GPU Really Working Efficiently in the Data Center? N Ways to... - Xiao Zhang & Wu Ying Jun
Taught by
CNCF [Cloud Native Computing Foundation]