Overview
Explore groundbreaking research on the theoretical limitations of multi-layer transformers in this technical Google TechTalk presented by Stanford University's Binghui Peng. Dive into the first unconditional lower bound proof against multi-layer decoder-only transformers, demonstrating that L-layer decoder-only transformers require polynomial model dimensions to perform sequential composition of L functions over n tokens. Learn about significant findings including depth-size trade-offs in multi-layer transformers, unconditional separation between encoder and decoder architectures, and the provable advantages of chain-of-thought reasoning. Understand the novel multi-party autoregressive communication model that captures decoder-only Transformer computation, along with new proof techniques for establishing lower bounds. Gain insights from Peng's collaborative work with Lijie Chen and Hongxun Wu, drawing from his extensive background in learning theory, game theory, and large language models developed through his experiences at Stanford University, Simons Institute, and Columbia University.
Syllabus
Theoretical Limitations of Multi layer Transformers
Taught by
Google TechTalks