This 22-minute talk by Liming Wang from MIT explores the theoretical foundations of how diffusion models can learn disentangled representations. Discover a novel theoretical framework that establishes identifiability conditions for general disentangled latent variable models, analyzes training dynamics, and derives sample complexity bounds for disentangled latent subspace models. Examine experimental validations across diverse tasks and modalities, including subspace recovery in latent subspace Gaussian mixture models, image colorization, image denoising, and voice conversion for speech classification. Learn how training strategies inspired by this theoretical approach, such as style guidance regularization, consistently enhance disentanglement performance. Wang, a postdoctoral associate in the Spoken Language Systems Group at MIT CSAIL, focuses his research on practical and theoretical aspects of self-supervised speech processing and multimodal learning to improve accessibility and inclusivity of speech and language technology.
Can Diffusion Model Disentangle? A Theoretical Perspective
Massachusetts Institute of Technology via YouTube
Overview
Syllabus
Liming Wang, Can Diffusion Model Disentangle? A Theoretical Perspective
Taught by
MIT Embodied Intelligence