Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Transformers Are RNNs- Fast Autoregressive Transformers With Linear Attention

Yannic Kilcher via YouTube

Overview

This course explores a paper that introduces a linear attention mechanism for transformers, reducing compute and memory requirements while revealing a connection between autoregressive transformers and RNNs. The learning outcomes include understanding the formulation of linear attention, its impact on transformer performance, and the relationship to RNNs. The course teaches about softmax attention, quadratic complexity, kernel functions, and conducting experiments. The teaching method involves a detailed explanation of the paper's concepts and findings. This course is intended for individuals interested in deep learning, transformer models, attention mechanisms, and the optimization of neural networks.

Syllabus

- Intro & Overview
- Softmax Attention & Transformers
- Quadratic Complexity of Softmax Attention
- Generalized Attention Mechanism
- Kernels
- Linear Attention
- Experiments
- Intuition on Linear Attention
- Connecting Autoregressive Transformers and RNNs
- Caveats with the RNN connection
- More Results & Conclusion

Taught by

Yannic Kilcher

Reviews

Start your review of Transformers Are RNNs- Fast Autoregressive Transformers With Linear Attention

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.