Join this Stanford University seminar where Sanmi Koyejo explores the need for more rigorous AI evaluation methods beyond static benchmarks. Learn how critical domains require better approaches to assess AI capabilities and safety. Discover a measurement framework that combines psychometric principles with modern AI evaluation needs, featuring techniques from Item Response Theory, amortized computation, and predictability analysis. Through safety assessment and capability measurement case studies, see how these methods can create more reliable, scalable, and meaningful evaluation systems for AI. The presentation builds toward transforming AI evaluation from benchmark collections into a rigorous measurement science capable of effectively guiding research, deployment, and policy decisions. Recorded on March 19, 2025 at Stanford University, this 73-minute seminar provides valuable insights for anyone interested in the future of AI assessment methodologies.
Overview
Syllabus
HAI Seminar with Sanmi Koyejo: Beyond Benchmarks – Building a Science of AI Measurement
Taught by
Stanford HAI