Overview
This course covers the following learning outcomes and goals: understanding Error, Risk, and Minimum Risk Training in Neural Networks for NLP, grasping the concept of Reinforcement Learning, learning about Policy Gradient, REINFORCE, and Value-based Reinforcement Learning, and exploring methods to stabilize Reinforcement Learning.
The course teaches individual skills such as implementing Policy Gradient, understanding Credit Assignment for Rewards, adding a Baseline, calculating Baselines, and estimating Value Functions in the context of Neural Networks for NLP.
The teaching method of the course involves lectures and theoretical explanations on Error, Risk, Minimum Risk Training, and Reinforcement Learning, with a focus on practical applications and examples.
The intended audience for this course includes students and professionals interested in Neural Networks for Natural Language Processing, specifically those looking to deepen their understanding of Minimum Risk Training and Reinforcement Learning techniques in this field.
Syllabus
Intro
Problem 1: Exposure Bias
Problem 2: Disregard to Evaluation Metrics
Error
Problem: Argmax is Non- differentiable
Sampling for Risk
Adding Temperature
What is Reinforcement Learning?
Why Reinforcement Learning in NLP?
Supervised MLE
Self Training
Policy Gradient/REINFORCE
Credit Assignment for Rewards
Problems w/ Reinforcement Learning
Adding a Baseline
Calculating Baselines
Increasing Batch Size
Warm-start
When to Use Reinforcement Learning?
Action-Value Function
Estimating Value Functions
Taught by
Graham Neubig