Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Optimizing Metrics Collection and Serving When Autoscaling LLM Workloads

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Plus Monthly Sale: All Certificates & Courses 40% Off!
Explore strategies for optimizing metrics collection and serving when autoscaling Large Language Model (LLM) workloads in this informative 35-minute conference talk presented by Vincent Hou from Bloomberg and Jiří Kremser from kedify.io at a CNCF event. Learn how to balance resource provision for LLM workloads to maintain both cost efficiency and service quality using Kubernetes's Horizontal Autoscaling capabilities. The presentation covers the fundamentals of horizontal autoscaling in Kubernetes, the unique challenges specific to LLM workloads, a comparative analysis of existing Kubernetes autoscaling solutions for custom metrics with their advantages and disadvantages, and techniques for improving scaling responsiveness through push-based metrics collection approaches. The speakers demonstrate an integrated solution utilizing KServe, OpenTelemetry collector, and KEDA to showcase effective optimization of LLM workload autoscaling in cloud-native environments.

Syllabus

Optimizing Metrics Collection & Serving When Autoscaling LLM Workloads - Vincent Hou & Jiří Kremser

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Optimizing Metrics Collection and Serving When Autoscaling LLM Workloads

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.