Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Beyond Benchmarks: Rethinking How We Evaluate LLMs in High-Stakes Environments

MLOps World: Machine Learning in Production via YouTube

Overview

Udemy Special: Ends May 28!
Learn Data Science. Courses starting at $12.99.
Get Deal
This talk from MLOps World explores the complex challenges of evaluating Large Language Models (LLMs) in high-stakes environments where traditional metrics fall short. Discover why LLMs require assessment across multiple dimensions including factual accuracy, reasoning depth, coherence, safety, and ethical alignment. Learn about three core quality tuning challenges: the subjectivity of open-ended responses, the absence of definitive ground truth in generative tasks, and the context-dependent nature of correctness. Explore a compositional quality tuning framework that dynamically adapts scoring weights to balance trade-offs between factuality, creativity, and safety, including mechanisms for detecting hallucinations and quantifying output robustness. Gain practical insights from real-world implementations across different industries, showing how targeted quality tuning significantly improves model performance in specialized domains. Essential viewing for anyone working with LLMs in critical applications where performance evaluation goes beyond standard benchmarks.

Syllabus

Beyond Benchmarks: Rethinking How We Evaluate LLMs in High-Stakes Environments

Taught by

MLOps World: Machine Learning in Production

Reviews

Start your review of Beyond Benchmarks: Rethinking How We Evaluate LLMs in High-Stakes Environments

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.