Reinforcement Learning from Human Feedback - From Zero to ChatGPT

Reinforcement Learning from Human Feedback - From Zero to ChatGPT

Hugging Face via YouTube Direct link

Introduction

1 of 30

1 of 30

Introduction

Class Central Classrooms beta

YouTube playlists curated by Class Central.

Classroom Contents

Reinforcement Learning from Human Feedback - From Zero to ChatGPT

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Introduction
  2. 2 Recent breakthroughs
  3. 3 What is RL
  4. 4 History of RL
  5. 5 Example of RL
  6. 6 ChatGPT
  7. 7 Technical details
  8. 8 Three conceptual parts
  9. 9 NLP Pretraining
  10. 10 Supervised Finetuning
  11. 11 Reward Model Training
  12. 12 Input and Output Pairs
  13. 13 Reward Model
  14. 14 KL Divergence
  15. 15 Scaling Factor
  16. 16 RL Optimizer
  17. 17 PPO
  18. 18 Conceptual Questions
  19. 19 Prompts and Responses
  20. 20 anthropics
  21. 21 blenderbot
  22. 22 thumbs up and thumbs down
  23. 23 chatGPT example
  24. 24 chatGPT vsanthropic
  25. 25 Open areas of investigation
  26. 26 Wrap up
  27. 27 Q A
  28. 28 Open Source Community
  29. 29 Reinforcement Learning from Email
  30. 30 Paper Release

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.