Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Controlling Untrusted AIs With Monitors

Simons Institute via YouTube

Overview

Coursera Plus Monthly Sale: All Certificates & Courses 40% Off!
This talk by Ethan Perez from Anthropic explores the concept of controlling untrusted AI systems through monitoring mechanisms. During the hour-long presentation, Perez discusses approaches for implementing safety guarantees for Large Language Models (LLMs) by using monitoring systems that can detect and prevent potentially harmful outputs or behaviors. Learn about cutting-edge techniques for maintaining control over increasingly powerful AI systems, even when the underlying models themselves cannot be fully trusted or verified for safety compliance.

Syllabus

Controlling Untrusted AIs With Monitors

Taught by

Simons Institute

Reviews

Start your review of Controlling Untrusted AIs With Monitors

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.