
Dashboards and Dragons: Crafting SLOs to Tame the AI Platform Chaos
CNCF [Cloud Native Computing Foundation] via YouTube
Overview

Udemy Special: Ends May 28!
Learn Data Science. Courses starting at $12.99.
Get Deal
This conference talk explores how to scale Kubernetes platforms effectively using Service Level Indicators (SLIs), Service Level Objectives (SLOs), and observability dashboards. Learn from Bloomberg engineers as they share their journey of managing multi-cluster platform complexity across cloud, on-premises, and hybrid environments. Discover practical strategies for defining meaningful metrics, designing actionable dashboards, and maintaining platform reliability at scale. The presentation offers real-life lessons and battle-tested approaches specifically focused on ensuring AI workloads run smoothly even during chaotic conditions. Gain valuable insights into platform observability design and best practices that can be applied to your own infrastructure challenges.
Syllabus
Dashboards & Dragons: Crafting SLOs To Tame the AI Platform Cha... Alexa Griffith & Ankita Chaudhari
Taught by
CNCF [Cloud Native Computing Foundation]