Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling LLM Batch Inference: Ray Data and vLLM for High Throughput

InfoQ via YouTube

Overview

Coursera Plus Annual Sale: All Certificates & Courses 25% Off!
This 48-minute InfoQ video explores the challenges of scaling Large Language Model (LLM) batch inference and demonstrates how to combine Ray Data with vLLM to achieve high throughput and cost-effective processing. Dive into techniques for leveraging heterogeneous computing resources, implementing fault tolerance for reliability, and optimizing inference pipelines for maximum efficiency. Examine real-world case studies that showcase significant performance improvements and cost reductions when processing large volumes of data through LLMs. Learn practical approaches to overcome common bottlenecks in batch inference workflows and implement scalable solutions for production environments.

Syllabus

Scaling LLM Batch Inference: Ray Data & vLLM for High Throughput

Taught by

InfoQ

Reviews

Start your review of Scaling LLM Batch Inference: Ray Data and vLLM for High Throughput

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.