Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling Continuous Deep QLearning - Part 2

Montreal Robotics via YouTube

Overview

Coursera Plus Monthly Sale: All Certificates & Courses 40% Off!
This lecture explores DDPG (Deep Deterministic Policy Gradient) as a pioneering deterministic policy gradient method that addresses the limitations of Deep Q-learning for continuous action spaces. Learn how to approximate maximum actions using a mu model, implement batch sampling for independent and identically distributed data, and utilize target networks for effective training. Understand the policy update process through gradient computation via the Q-function and examine Polyac averaging for target network updates—noting its practical success despite theoretical questions. Discover practical implementation challenges such as preventing policy output explosion and explore solutions including regularization, tanh activations, and gradient squashing. Compare exploration noise approaches (Gaussian vs. Ornstein-Uhlenbeck) and analyze the tradeoffs between discrete and continuous action spaces, with special attention to multimodality. Examine how Q-functions' sensitivity to policy inputs can be mitigated by adding noise to target values (as in TD3) to improve robustness and performance. Question the continued reliance on simple environments like inverted pendulum for algorithm evaluation, and explore recent research on scaling deep Q-learning methods through mixtures of experts, layer normalization, and innovative network structures.

Syllabus

RobotLearning: Scaling Continuous Deep QLearning Part2

Taught by

Montreal Robotics

Reviews

Start your review of Scaling Continuous Deep QLearning - Part 2

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.