Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Theoretical Limitations of Multi-Layer Transformers

Google TechTalks via YouTube

Overview

Coursera Plus Annual Sale: All Certificates & Courses 25% Off!
Explore groundbreaking research on the theoretical limitations of multi-layer transformers in this technical Google TechTalk presented by Stanford University's Binghui Peng. Dive into the first unconditional lower bound proof against multi-layer decoder-only transformers, demonstrating that L-layer decoder-only transformers require polynomial model dimensions to perform sequential composition of L functions over n tokens. Learn about significant findings including depth-size trade-offs in multi-layer transformers, unconditional separation between encoder and decoder architectures, and the provable advantages of chain-of-thought reasoning. Understand the novel multi-party autoregressive communication model that captures decoder-only Transformer computation, along with new proof techniques for establishing lower bounds. Gain insights from Peng's collaborative work with Lijie Chen and Hongxun Wu, drawing from his extensive background in learning theory, game theory, and large language models developed through his experiences at Stanford University, Simons Institute, and Columbia University.

Syllabus

Theoretical Limitations of Multi layer Transformers

Taught by

Google TechTalks

Reviews

Start your review of Theoretical Limitations of Multi-Layer Transformers

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.