Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Benchmarking Your Distributed ML Training on the K8s Platform

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Udemy Special: Ends May 28!
Learn Data Science. Courses starting at $12.99.
Get Deal
This lightning talk explores how to benchmark distributed machine learning training on Kubernetes platforms. Discover the challenges of running ML training workloads on Kubernetes, including dynamic resource scaling, GPU scheduling, and efficient inter-node communication. Learn about recent advancements like KubeRay, Kubeflow, and Slurm integration that have expanded Kubernetes' capabilities for handling complex, large-scale ML training tasks. Explore the design and implementation of a benchmarking platform that provides actionable insights to improve throughput, scalability, and efficiency of distributed ML training workloads on Kubernetes.

Syllabus

Lightning Talk: Benchmarking Your Distributed ML Training on the K8s Platform - Liang Yan, CoreWeave

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Benchmarking Your Distributed ML Training on the K8s Platform

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.