Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

KL Divergence Implementation in DeepSeek R1 - A Deep Learning Tutorial

Yacine Mahdid via YouTube

Overview

Coursera Plus Annual Sale: All Certificates & Courses 25% Off!
Dive deep into the mathematical foundations of KL divergence implementation in DeepSeek R1's GRPO through this 20-minute technical tutorial. Learn the key differences between GRPO and PPO's KL divergence approaches, starting with a comprehensive refresher on the concept. Follow along with detailed explanations of Monte Carlo estimation and explore three key formulations: logarithmic ratio (k1), squared logarithmic ratio (k2), and the difference-based approach (k3). Examine practical benchmarking results and gain valuable insights from Schulman's influential blog post on KL approximation. Perfect for machine learning practitioners seeking to understand the mathematical underpinnings of modern deep learning algorithms.

Syllabus

- Introduction: 0:00
- KL Divergence in GRPO vs PPO: 1:00
- KL Divergence refresher: 2:30
- Monte Carlo estimation of KL divergence: 6:42
- Schulman blog: 7:58
- k1 = logq/p: 8:55
- k2 = 0.5*logp/q^2: 11:23
- k3 = p/q - 1 - logp/q: 13:35
- benchmarking: 15:58
- takeaways: 18:43

Taught by

Yacine Mahdid

Reviews

Start your review of KL Divergence Implementation in DeepSeek R1 - A Deep Learning Tutorial

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.