
Udemy Special: Ends May 28!
Learn Data Science. Courses starting at $12.99.
Get Deal
This talk from MLOps World explores the complex challenges of evaluating Large Language Models (LLMs) in high-stakes environments where traditional metrics fall short. Discover why LLMs require assessment across multiple dimensions including factual accuracy, reasoning depth, coherence, safety, and ethical alignment. Learn about three core quality tuning challenges: the subjectivity of open-ended responses, the absence of definitive ground truth in generative tasks, and the context-dependent nature of correctness. Explore a compositional quality tuning framework that dynamically adapts scoring weights to balance trade-offs between factuality, creativity, and safety, including mechanisms for detecting hallucinations and quantifying output robustness. Gain practical insights from real-world implementations across different industries, showing how targeted quality tuning significantly improves model performance in specialized domains. Essential viewing for anyone working with LLMs in critical applications where performance evaluation goes beyond standard benchmarks.