Overview
This lecture explores DDPG (Deep Deterministic Policy Gradient) as a pioneering deterministic policy gradient method that addresses the limitations of Deep Q-learning for continuous action spaces. Learn how to approximate maximum actions using a mu model, implement batch sampling for independent and identically distributed data, and utilize target networks for effective training. Understand the policy update process through gradient computation via the Q-function and examine Polyac averaging for target network updates—noting its practical success despite theoretical questions. Discover practical implementation challenges such as preventing policy output explosion and explore solutions including regularization, tanh activations, and gradient squashing. Compare exploration noise approaches (Gaussian vs. Ornstein-Uhlenbeck) and analyze the tradeoffs between discrete and continuous action spaces, with special attention to multimodality. Examine how Q-functions' sensitivity to policy inputs can be mitigated by adding noise to target values (as in TD3) to improve robustness and performance. Question the continued reliance on simple environments like inverted pendulum for algorithm evaluation, and explore recent research on scaling deep Q-learning methods through mixtures of experts, layer normalization, and innovative network structures.
Syllabus
RobotLearning: Scaling Continuous Deep QLearning Part2
Taught by
Montreal Robotics