Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Accelerate Your AI/ML Workloads With Topology-Aware Scheduling in Kueue

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Udemy Special: Ends May 28!
Learn Data Science. Courses starting at $12.99.
Get Deal
This conference talk explores how to optimize AI/ML workload performance using Topology-Aware Scheduling in Kueue. Learn how to address the network throughput bottlenecks that occur when AI training and inference workloads exchange massive amounts of data between pods, especially critical in the era of Large Language Models. Discover how Kueue, as a Job-level scheduler, leverages cluster topology information through a proposed node labeling convention to optimize Pod placement. The presenters, Michał Woźniak from Google and Yuki Iwai from CyberAgent, explain the key concepts behind Topology-Aware Scheduling (TAS), compare it with alternative approaches, and demonstrate how it significantly improves execution time for AI workloads by ordering Pods by indices to enhance the performance of AI frameworks using NCCL.

Syllabus

Accelerate Your AI/ML Workloads With Topology-Aware Scheduling in Kueue - Michał Woźniak & Yuki Iwai

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Accelerate Your AI/ML Workloads With Topology-Aware Scheduling in Kueue

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.