Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Advanced Model Serving Techniques with Ray on Kubernetes

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Watch a 42-minute conference talk exploring advanced techniques for serving Large Language Models using Ray on Kubernetes. Dive into sophisticated model serving approaches including model composition, multiplexing, and fractional GPU scheduling presented by experts from Google and Anyscale. Learn about cutting-edge GPU-native communication initiatives in Ray and how they integrate with Kubernetes DRA to enable tensor parallelism across multiple GPUs. Experience a live demonstration showcasing KubeRay's practical implementation of these techniques for real-world LLM deployments, highlighting Ray's capabilities in scaling and orchestrating open-source models across diverse hardware accelerators and failure domains.

Syllabus

Advanced Model Serving Techniques with Ray on Kubernetes - Andrew Sy Kim & Kai-Hsun Chen

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Advanced Model Serving Techniques with Ray on Kubernetes

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.