Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Training LLMs to Think - Understanding o1 and DeepSeek-R1 Models

Shaw Talebi via YouTube

Overview

Coursera Plus Annual Sale: All Certificates & Courses 25% Off!
Explore a detailed technical video lecture examining the advanced reasoning capabilities of large language models, specifically focusing on o1 and DeepSeek-R1 models trained using large-scale reinforcement learning. Delve into key concepts including test-time compute requirements, thinking tokens, and the implementation of reinforcement learning techniques. Learn about the R1-Zero methodology, covering its prompt templates, reward mechanisms, and GRPO (Generalized Reward-Powered Optimization) technical details. Understand DeepSeek R1's four-step training process, from supervised fine-tuning with Chain of Thought to the final reinforcement learning and RLHF stages. Gain practical insights on accessing DeepSeek models and comprehend the broader implications of these advancements in AI reasoning capabilities.

Syllabus

Intro - 0:00
OpenAI's o1 - 0:33
Test-time Compute - 1:33
"Thinking" Tokens - 3:50
DeepSeek Paper - 5:58
Reinforcement Learning - 7:22
R1-Zero: Prompt Template - 9:28
R1-Zero: Reward - 10:53
R1-Zero: GRPO technical - 12:53
R1-Zero: Results - 20:00
DeepSeek R1 - 23:32
Step 1: SFT with CoT - 24:47
Step 2: R1-Zero Style RL - 26:14
Step 3: SFT with Mixed Data - 27:03
Step 4: RL & RLHF - 28:26
Accessing DeepSeek Models - 29:18
Conclusions - 30:10

Taught by

Shaw Talebi

Reviews

Start your review of Training LLMs to Think - Understanding o1 and DeepSeek-R1 Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.