Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Accelerating LLM Inference with vLLM

Databricks via YouTube

Overview

Explore the cutting-edge advancements in LLM inference performance through this 36-minute conference talk by Cade Daniel and Zhuohan Li. Dive into the world of vLLM, an open-source engine developed at UC Berkeley that has revolutionized LLM inference and serving. Learn about key performance-enhancing techniques such as paged attention and continuous batching. Discover recent innovations in vLLM, including Speculative Decoding, Prefix Caching, Disaggregated Prefill, and multi-accelerator support. Gain insights from industry case studies and get a glimpse of vLLM's future roadmap. Understand how vLLM's focus on production-readiness and extensibility has led to new system insights and widespread community adoption, making it a state-of-the-art, accelerator-agnostic solution for LLM inference.

Syllabus

Accelerating LLM Inference with vLLM

Taught by

Databricks

Reviews

Start your review of Accelerating LLM Inference with vLLM

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.