Overview
This lecture explores the evolution of Deterministic Policy Gradient methods for continuous action spaces in reinforcement learning, beginning with DDPG as a transition from Deep Q-learning. Learn how continuous actions are handled through approximation using a mu model, the importance of batch sampling for independent and identically distributed data, and the role of target networks in training stability. Discover the mechanics of policy updates through gradient computation via the Q-function and the empirical success of Polyac averaging for target network updates. Examine practical implementation challenges including preventing policy output explosion through regularization techniques, tanh activations, and gradient squashing. Compare exploration strategies using Gaussian versus Ornstein-Uhlenbeck noise, and understand the critical considerations when choosing between discrete and continuous action spaces. The lecture also covers recent advancements in making deep Q-learning more scalable through mixtures of experts, layer normalization, and innovative network structures, while questioning the continued use of simple environments like inverted pendulum for algorithm evaluation instead of more complex, real-world relevant tasks.
Syllabus
RobotLearning: Scaling Continuous Deep QLearning Part1
Taught by
Montreal Robotics