Transformers Are RNNs- Fast Autoregressive Transformers With Linear Attention

Overview

This course explores a paper that introduces a linear attention mechanism for transformers, reducing compute and memory requirements while revealing a connection between autoregressive transformers and RNNs. The learning outcomes include understanding the formulation of linear attention, its impact on transformer performance, and the relationship to RNNs. The course teaches about softmax attention, quadratic complexity, kernel functions, and conducting experiments. The teaching method involves a detailed explanation of the paper's concepts and findings. This course is intended for individuals interested in deep learning, transformer models, attention mechanisms, and the optimization of neural networks.

Syllabus

- Intro & Overview
- Softmax Attention & Transformers
- Quadratic Complexity of Softmax Attention
- Generalized Attention Mechanism
- Kernels
- Linear Attention
- Experiments
- Intuition on Linear Attention
- Connecting Autoregressive Transformers and RNNs
- Caveats with the RNN connection
- More Results & Conclusion

Taught by

Yannic Kilcher

Reviews

Start your review of Transformers Are RNNs- Fast Autoregressive Transformers With Linear Attention

BloomTech’s Downfall: A Long Time Coming

Most common

Popular subjects

Popular courses

Transformers Are RNNs- Fast Autoregressive Transformers With Linear Attention

Overview

Syllabus

Taught by

Reviews

BloomTech’s Downfall: A Long Time Coming

Taught by

RWKV- Reinventing RNNs for the Transformer Era

Perceiver - General Perception with Iterative Attention

10 Best Deep Learning Courses

Never Stop Learning.