Overview
In this one-hour lecture, Gauthier Gidel from IVADO - Mila - Université de Montréal explores various approaches to assess and enhance the robustness of Large Language Models' safety mechanisms. Discover the research conducted by Gidel's lab focused on developing safety-guaranteed LLMs through adversarial training techniques. Learn about methodologies for testing LLM vulnerabilities, strengthening defense mechanisms against malicious prompts, and creating more reliable AI systems that maintain their safety guardrails even under challenging conditions. The presentation provides valuable insights for researchers, AI safety specialists, and anyone interested in the critical field of responsible AI development.
Syllabus
Adversarial Training For LLMs' Safety Robustness
Taught by
Simons Institute