Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Understanding DeepSeek R1 Reward Modeling and Verifiers for AI Training

Chris Hay via YouTube

Overview

Coursera Plus Annual Sale: All Certificates & Courses 25% Off!
Learn how to implement DeepSeek and O1's rule-based AI reward modeling using verifiers in a comprehensive 45-minute video tutorial. Master the creation of various verifiers including format, accuracy, boxed, and limerick types to shape AI-generated outputs. Explore different sampling strategies like greedy vs. top-p sampling, and discover how to generate custom verifier datasets. Follow along to understand the process of collecting prompts from a DeepSeek teacher model and implementing the same techniques used in the DeepSeek paper to evoke long chains of thought. Gain practical experience in fine-tuning models through SFT training on collected prompts and evaluating their performance and chain-of-thought quality. Access companion GitHub repositories for fine-tuning, verifiers, and math synthetic data generation to create your own reasoning-based model locally. Perfect for both AI enthusiasts and experienced practitioners looking to enhance their understanding of AI reward strategies and verifier implementation.

Syllabus

00:00 - intro
00:53 - deepseek reward modelling
03:20 - format reward verifier
07:31 - accuracy reward verifier
09:43 - boxed reward verifier
12:11 - verifier answer verifier
13:39 - limerick verifier
16:25 - llm verifiers
18:29 - evoking outputs deepseek style
19:07 - greedy sampling
23:10 - top p sampling
30:00 - generating verifier datasets
33:00 - collecting prompts from teacher model deepseek
37:00 - sft training on collected prompts
37:33 - inferring from trained model
38:50 - chain of thought quality

Taught by

Chris Hay

Reviews

Start your review of Understanding DeepSeek R1 Reward Modeling and Verifiers for AI Training

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.