Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Is Your GPU Really Working Efficiently in the Data Center? N Ways to Improve GPU Usage

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Explore strategies to optimize GPU efficiency in data centers through this informative conference talk. Discover how to improve Model Flops Utilization (MFU) of AI accelerators by examining real-world production practices. Learn about training Large Language Models (LLMs) with billions of parameters on large-scale Kubernetes clusters, covering techniques such as model parallelism, switch-affinity scheduling, and checkpoint optimization. Gain insights into enhancing GPU utilization through GPU sharing technology, implementing training-inference hybrid solutions for tidal scenarios, and improving efficiency through node grouping and application matching. Understand the challenges of GPU monopolization by underutilized applications and explore methods to ensure AI devices work efficiently around the clock.

Syllabus

Is Your GPU Really Working Efficiently in the Data Center? N Ways to... - Xiao Zhang & Wu Ying Jun

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Is Your GPU Really Working Efficiently in the Data Center? N Ways to Improve GPU Usage

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.