Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling Continuous Deep QLearning - Part 1

Montreal Robotics via YouTube

Overview

Coursera Plus Monthly Sale: All Certificates & Courses 40% Off!
This lecture explores the evolution of Deterministic Policy Gradient methods for continuous action spaces in reinforcement learning, beginning with DDPG as a transition from Deep Q-learning. Learn how continuous actions are handled through approximation using a mu model, the importance of batch sampling for independent and identically distributed data, and the role of target networks in training stability. Discover the mechanics of policy updates through gradient computation via the Q-function and the empirical success of Polyac averaging for target network updates. Examine practical implementation challenges including preventing policy output explosion through regularization techniques, tanh activations, and gradient squashing. Compare exploration strategies using Gaussian versus Ornstein-Uhlenbeck noise, and understand the critical considerations when choosing between discrete and continuous action spaces. The lecture also covers recent advancements in making deep Q-learning more scalable through mixtures of experts, layer normalization, and innovative network structures, while questioning the continued use of simple environments like inverted pendulum for algorithm evaluation instead of more complex, real-world relevant tasks.

Syllabus

RobotLearning: Scaling Continuous Deep QLearning Part1

Taught by

Montreal Robotics

Reviews

Start your review of Scaling Continuous Deep QLearning - Part 1

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.