This talk by Zhiyuan Li from TTIC explores PENCIL, a novel method for improving memory efficiency in Large Language Models (LLMs) during reasoning tasks. Learn how PENCIL incorporates a reduction mechanism into autoregressive generation, allowing models to actively discard obsolete tokens and perform space-efficient computation. Discover how this approach enables models to simulate Turing machines with maximal context length matching space complexity and total generated tokens matching time complexity. The presentation demonstrates PENCIL's practical advantages, including how a 25M-parameter transformer with just 2048-token context length achieved 97% accuracy on the challenging 5×5 Einstein's puzzle. This efficiency breakthrough addresses a key limitation of standard Chain of Thought (CoT) approaches, which typically require context length equal to CoT length, making PENCIL particularly valuable for complex reasoning tasks that would otherwise demand excessive memory resources.
Overview
Syllabus
Pencil: Long Thoughts with Short Memory
Taught by
Simons Institute