Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Understanding DeepSeek R1 and GRPO - A Technical Deep Dive

Oxen via YouTube

Overview

Coursera Plus Annual Sale: All Certificates & Courses 25% Off!
Dive deep into a comprehensive 45-minute technical video exploring the inner workings of DeepSeek models, with a particular focus on R1 and GRPO (Generative Rejection Policy Optimization). Learn about the significance of reasoning models, DeepSeek's architectural contributions, and the technical implementation details of DeepSeek-v3 and R1-Zero. Explore crucial topics including hardware requirements, cold start data handling, rejection sampling techniques, supervised fine-tuning processes, and the differences between distillation and reinforcement learning. Understand how these models perform on the ARC-AGI benchmark and discover practical insights for self-hosting DeepSeek. The presentation includes detailed explanations of helpfulness and harmlessness reinforcement learning, along with methods for distilling smaller models, making it valuable for AI researchers, engineers, and technical enthusiasts interested in large language model development.

Syllabus

0:00 Preview
0:31 Intro to Arxiv Dives
3:42 Why is R1 Important?
6:38 What is a Reasoning Model?
8:55 What are DeepSeek R1’s Contributions?
12:27 How DeepSeek-v3 Works
16:01 What Hardware do You Need?
16:50 How DeepSeek-R1-Zero Works
17:23 How GRPO works
25:30 DeepSeek’s Aha Moment
29:06 R1 on ARC-AGI Benchmark
30:20 Self-Hosting DeepSeek
31:38 How DeepSeek-R1 Works
34:05 What was the Cold Start Data
36:58 Rejection Sampling and Supervised Fine Tuning
38:30 Helpfulness and Harmlessness Reinforcement Learning
39:45 Distilling Smaller Models
41:25 Distillation vs. Reinforcement Learning

Taught by

Oxen

Reviews

Start your review of Understanding DeepSeek R1 and GRPO - A Technical Deep Dive

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.