This course explores the concept of FNets, which replace the Attention mechanism with Fourier transforms in Transformer encoder architectures. By utilizing simple linear transformations to "mix" input tokens, FNets achieve comparable performance to Transformers while significantly reducing parameter count and computational requirements. The course covers the FNet architecture, the role of Fourier transforms in token mixing, experimental results, and implications. The intended audience for this course includes machine learning researchers, practitioners, and enthusiasts interested in efficient models for text classification tasks.
Overview
Syllabus
- Intro & Overview
- Giving up on Attention
- FNet Architecture
- Going deeper into the Fourier Transform
- The Importance of Mixing
- Experimental Results
- Conclusions & Comments
Taught by
Yannic Kilcher