Overview
Syllabus
0:00 Preview
0:31 Intro to Arxiv Dives
3:42 Why is R1 Important?
6:38 What is a Reasoning Model?
8:55 What are DeepSeek R1’s Contributions?
12:27 How DeepSeek-v3 Works
16:01 What Hardware do You Need?
16:50 How DeepSeek-R1-Zero Works
17:23 How GRPO works
25:30 DeepSeek’s Aha Moment
29:06 R1 on ARC-AGI Benchmark
30:20 Self-Hosting DeepSeek
31:38 How DeepSeek-R1 Works
34:05 What was the Cold Start Data
36:58 Rejection Sampling and Supervised Fine Tuning
38:30 Helpfulness and Harmlessness Reinforcement Learning
39:45 Distilling Smaller Models
41:25 Distillation vs. Reinforcement Learning
Taught by
Oxen