Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Online Course

Sample-based Learning Methods

University of Alberta and Alberta Machine Intelligence Institute via Coursera

(14)
  • Provider Coursera
  • Cost Free Online Course (Audit)
  • Session Upcoming
  • Language English
  • Certificate Paid Certificate Available
  • Duration 5 weeks long
  • Learn more about MOOCs

Taken this course? Share your experience with other students. Write review

Overview

In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent’s own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning.

By the end of this course you will be able to:

- Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience
- Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model
- Understand the connections between Monte Carlo and Dynamic Programming and TD.
- Implement and apply the TD algorithm, for estimating value functions
- Implement and apply Expected Sarsa and Q-learning (two TD methods for control)
- Understand the difference between on-policy and off-policy control
- Understand planning with simulated experience (as opposed to classic planning strategies)
- Implement a model-based approach to RL, called Dyna, which uses simulated experience
- Conduct an empirical study to see the improvements in sample efficiency when using Dyna

Syllabus

Welcome to the Course!
-Welcome to the second course in the Reinforcement Learning Specialization: Sample-Based Learning Methods, brought to you by the University of Alberta, Onlea, and Coursera. In this pre-course module, you'll be introduced to your instructors, and get a flavour of what the course has in store for you. Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

Monte Carlo Methods for Prediction & Control
-This week you will learn how to estimate value functions and optimal policies, using only sampled experience from the environment. This module represents our first step toward incremental learning methods that learn from the agent’s own interaction with the world, rather than a model of the world. You will learn about on-policy and off-policy methods for prediction and control, using Monte Carlo methods---methods that use sampled returns. You will also be reintroduced to the exploration problem, but more generally in RL, beyond bandits.

Temporal Difference Learning Methods for Prediction
-This week, you will learn about one of the most fundamental concepts in reinforcement learning: temporal difference (TD) learning. TD learning combines some of the features of both Monte Carlo and Dynamic Programming (DP) methods. TD methods are similar to Monte Carlo methods in that they can learn from the agent’s interaction with the world, and do not require knowledge of the model. TD methods are similar to DP methods in that they bootstrap, and thus can learn online---no waiting until the end of an episode. You will see how TD can learn more efficiently than Monte Carlo, due to bootstrapping. For this module, we first focus on TD for prediction, and discuss TD for control in the next module. This week, you will implement TD to estimate the value function for a fixed policy, in a simulated domain.

Temporal Difference Learning Methods for Control
-This week, you will learn about using temporal difference learning for control, as a generalized policy iteration strategy. You will see three different algorithms based on bootstrapping and Bellman equations for control: Sarsa, Q-learning and Expected Sarsa. You will see some of the differences between the methods for on-policy and off-policy control, and that Expected Sarsa is a unified algorithm for both. You will implement Expected Sarsa and Q-learning, on Cliff World.

Planning, Learning & Acting
-Up until now, you might think that learning with and without a model are two distinct, and in some ways, competing strategies: planning with Dynamic Programming verses sample-based learning via TD methods. This week we unify these two strategies with the Dyna architecture. You will learn how to estimate the model from data and then use this model to generate hypothetical experience (a bit like dreaming) to dramatically improve sample efficiency compared to sample-based methods like Q-learning. In addition, you will learn how to design learning systems that are robust to inaccurate models.

Taught by

Martha White and Adam White

Class Central Charts

Help Center

Most commonly asked questions about Coursera

Reviews for Coursera's Sample-based Learning Methods Based on 14 reviews

  • 5 stars 71%
  • 4 stars 29%
  • 3 star 0%
  • 2 star 0%
  • 1 star 0%

Did you take this course? Share your experience with other students.

Write a review
  • 1
Stewart A
Stewart completed this course, spending 5 hours a week on it and found the course difficulty to be medium.
To be brief, this is a great course on Reinforcement Learning (RL) and I thoroughly recommend it. This is the second course in the four course Reinforcement Learning specialization from the Alberta Machine Intelligence Institute (AMII) at University of Alberta. The course builds upon the knowledge and...
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
Overall the course seem to me very well structured and the videous help you to understand the book content. The only drawback for which I gave only 4 stars out of 5 is the submission limit of programming assignments. You can submit the assignment only 5 times afterwards you are blocked for 4 months. This does not feel right, you should be able to submit it as many times as you need, until you are successful, since you pay for the course. Only in this way you would be encouraged to think at more and more possible solutions
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
1) Material is highly relevant

2) Programming assignments are unmanageable - even though you manage to create code that passes unit tests, the grader that attempts to evaluate graphs generated during experiments works as black magic, and if your submission is rejected - you'll never know why

3) regarding the lectures - personally for me they can be replaced by reading relevant chapters of RL book
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
The course is fairly well detailed and contains a good deal of topics. If I have any complaints, it would be that the lectures could be a bit longer and dive into topics a little more. To be fair to the course, they do give you a weekly reading list from the prescribed textbook, but it would be nice...
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
The course is overall very good: lectures are very clear, quizzes are challenging and the course relies on a text book, provided when you enroll. The only weak point, but not a serious issue, is that most of the lectures do not add content to what is in the book. Since studying the book is in fact mandatory, they could have used the lectures to better explain some concepts, assuming people read the book. Sometimes they do, but not so often.
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
I really enjoyed this second course of the specialisation.

The content and explanations are very helpful in building your intuition around quite complex concepts of sample-based RL. Quizzes and programming exercises are challenging enough to help you grasp necessary concepts and get hands on experience. Look forward to the next course in the specialisation.
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
This course is in the middle of RL path. But the main thing you learn here is some algorithms that could be used for most practical problems (I don't want to name this algos:D). The learning and teaching method is also fantastic. The course is carefully designed for people who really want to dedicate their time and efforts to learn.
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
The specialization is a great way to get into RL. The book is excellent but doing the programming assignments from the book can be a little tedious. My approach was to read the book carefully and do the assignments provided as part of the course. I had a lot of fun.

The course does not, however, require several weeks of effort.
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
Great class. The instructors are knowledgeable and communicate well. The curriculum is well thought out. Course material gives a good blend of practical and theoretical knowledge. Also includes access to the definitive textbook, which gives the option to dive deeper into related topics.
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
The course is well built and gives some theoretical basis, building up more and more complex concepts. The hands on exercises are well made too and I could use them to run my own experiments, the only issue is that rl-glue seems a bit outdated now? I enjoyed the course and learned a lot. Thanks!
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
Quizzes and Python assignments help a lot to understand the subjects. It could be helpful to have some additional references in which the methods and algorithms presented in the course are applied possibly in real cases.
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
Great course! Strongly based on the textbook by Sutton and Barto, but I think they complement each other greatly. Explanations from the instructors are easy to follow and the online exercises are concrete and illustrative.
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
Excellent introduction to Reinforcement Learning. It would really help to be fairly proficient in python before taking this course. Also important to do at least some of the reading in their textbook.
Was this review helpful to you? Yes
Luiz C
Luiz completed this course, spending 2 hours a week on it and found the course difficulty to be medium.
Great Course. Every aspect of this course is top notch: videos, presentations, tests and notebooks. It is a good complement to the RL Book
Was this review helpful to you? Yes
  • 1

Class Central

Get personalized course recommendations, track subjects and courses with reminders, and more.

Sign up for free

Never stop learning Never Stop Learning!

Get personalized course recommendations, track subjects and courses with reminders, and more.

Sign up for free