Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Robustness of Jailbreaking Across Aligned LLMs, Reasoning Models and Agents

Simons Institute via YouTube

Overview

Coursera Plus Monthly Sale: All Certificates & Courses 40% Off!
Explore a 59-minute lecture by Siva Reddy from IVADO - Mila - McGill University, presented at the Simons Institute, examining the vulnerability of aligned language models to jailbreaking attempts. Investigate how these security exploits transfer across different types of AI systems, including standard large language models, reasoning-enhanced models, and autonomous agents. The presentation, part of the Safety-Guaranteed LLMs series, offers critical insights into the robustness challenges facing AI safety mechanisms and potential implications for developing more secure AI systems.

Syllabus

Robustness of jailbreaking across aligned LLMs, reasoning models and agents

Taught by

Simons Institute

Reviews

Start your review of Robustness of Jailbreaking Across Aligned LLMs, Reasoning Models and Agents

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.