Overview
This video features a deep dive into RWKV-7 "Goose" with paper author Eugene Cheah, exploring how this innovative model combines RNN architecture with transformer-like capabilities. Learn why RWKV-7 is generating excitement in the AI community, how to run it efficiently, and understand its fundamental architecture. The discussion covers how RWKV addresses traditional RNN limitations, examines the paper "Reinventing RNNs for the Transformer Era," and features direct insights from Eugene Cheah who explains the intuition behind each model layer. Discover the parallelization techniques used during training, review benchmark performance, see live evaluations, get fine-tuning tips, and understand the reasoning behind the World Tokenizer development. Perfect for AI researchers, developers, and enthusiasts interested in state-of-the-art language models that offer linear inference capabilities.
Syllabus
0:00 Why is RWKV-7 Goose interesting
2:53 How to quickly run RWKV-7 Goose
4:04 What is RWKV-7
10:20 RNN’s forget things
12:33 First paper: Reinventing RNNs for the Transformer Era
24:22 Paper author Eugene Cheah joins the dive
36:43 The intuition behind each model layer
47:57 Parallelization during training
53:01 How well did RWKV-7 do on benchmarks?
56:50 Live evals on RWKV-7 and fine-tuning tips
1:00:38 Why they made the World Tokenizer
Taught by
Oxen