Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight?

Simons Institute via YouTube

Overview

Udemy Special: Ends May 28!
Learn Data Science. Courses starting at $12.99.
Get Deal
This talk by Geoffrey Irving from the UK AI Safety Institute explores the theoretical foundations and practical applications of scalable oversight for AI alignment. Learn about recent advancements in computational complexity, multi-agent training dynamics, and learning theory that aim to provide theoretical safety guarantees under simplified human feedback assumptions. Discover the innovative "prover-predictor game" variant of debate that addresses the "obfuscated arguments" problem from earlier experiments while allowing ML systems to operate more efficiently with ML-checkable arguments. Examine the potential for extending these methods to more realistic human feedback scenarios and stronger solution requirements, drawing on untapped resources from theoretical computer science. Understand how these approaches, structured as zero-sum adversarial team games, might translate into practical, convergent training methods that offer asymptotic safety guarantees with real-world applicability.

Syllabus

Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight?

Taught by

Simons Institute

Reviews

Start your review of Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight?

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.