Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Slinky: Slurm in Kubernetes - Performant AI and HPC Workload Management in Kubernetes

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Plus Monthly Sale: All Certificates & Courses 40% Off!
This conference talk explores Slinky, a fully open-source toolset designed to integrate Slurm with Kubernetes for more efficient AI and HPC workload management. Discover how Kubernetes, originally designed for microservices, is adapting to support AI training and multi-node inference workloads. Learn about Slurm, the most widely used HPC workload manager with over two decades of development, which excels at gang scheduling, fair usage, job planning, and batch scheduling. Explore the architecture of Slinky, which includes a Slurm operator, client library, and metrics exporter, and understand the challenges of achieving fine-grained control in Kubernetes for AI and HPC workloads. Presented by Tim Wickberg from SchedMD at a CNCF event, this 39-minute talk provides valuable insights for those looking to improve performance and efficiency in AI clusters.

Syllabus

Slinky: Slurm in Kubernetes, Performant AI and HPC Workload Management in Kubernetes - Tim Wickberg

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Slinky: Slurm in Kubernetes - Performant AI and HPC Workload Management in Kubernetes

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.