Sample-based Learning Methods

University of Alberta and Alberta Machine Intelligence Institute via Coursera

Go to class Write review

Details

Go to class

Provider

Coursera
Pricing

Free Online Course (Audit)
Languages

English
Certificate

Paid Certificate Available
Duration & workload

21 hours 31 minutes
Sessions

On-Demand
Level

Intermediate
Subtitles

Arabic, French, Portuguese, Italian, German, Russian, English, Spanish, Thai, Indonesian, Kazakh, Hindi, Swedish, Korean, Greek, Chinese, Ukrainian, Japanese, Polish, Dutch, Turkish, Hungarian, Bengali, Pashto, Urdu, Azerbaijani, Farsi

Found in

Part of

Reinforcement Learning

3.5

Overview

Class Central Tips

In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent’s own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning. By the end of this course you will be able to: - Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience - Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model - Understand the connections between Monte Carlo and Dynamic Programming and TD. - Implement and apply the TD algorithm, for estimating value functions - Implement and apply Expected Sarsa and Q-learning (two TD methods for control) - Understand the difference between on-policy and off-policy control - Understand planning with simulated experience (as opposed to classic planning strategies) - Implement a model-based approach to RL, called Dyna, which uses simulated experience - Conduct an empirical study to see the improvements in sample efficiency when using Dyna

Syllabus

Welcome to the Course!

Welcome to the second course in the Reinforcement Learning Specialization: Sample-Based Learning Methods, brought to you by the University of Alberta, Onlea, and Coursera. In this pre-course module, you'll be introduced to your instructors, and get a flavour of what the course has in store for you. Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

Monte Carlo Methods for Prediction & Control

This week you will learn how to estimate value functions and optimal policies, using only sampled experience from the environment. This module represents our first step toward incremental learning methods that learn from the agent’s own interaction with the world, rather than a model of the world. You will learn about on-policy and off-policy methods for prediction and control, using Monte Carlo methods---methods that use sampled returns. You will also be reintroduced to the exploration problem, but more generally in RL, beyond bandits.

Temporal Difference Learning Methods for Prediction

This week, you will learn about one of the most fundamental concepts in reinforcement learning: temporal difference (TD) learning. TD learning combines some of the features of both Monte Carlo and Dynamic Programming (DP) methods. TD methods are similar to Monte Carlo methods in that they can learn from the agent’s interaction with the world, and do not require knowledge of the model. TD methods are similar to DP methods in that they bootstrap, and thus can learn online---no waiting until the end of an episode. You will see how TD can learn more efficiently than Monte Carlo, due to bootstrapping. For this module, we first focus on TD for prediction, and discuss TD for control in the next module. This week, you will implement TD to estimate the value function for a fixed policy, in a simulated domain.

Temporal Difference Learning Methods for Control

This week, you will learn about using temporal difference learning for control, as a generalized policy iteration strategy. You will see three different algorithms based on bootstrapping and Bellman equations for control: Sarsa, Q-learning and Expected Sarsa. You will see some of the differences between the methods for on-policy and off-policy control, and that Expected Sarsa is a unified algorithm for both. You will implement Expected Sarsa and Q-learning, on Cliff World.

Planning, Learning & Acting

Up until now, you might think that learning with and without a model are two distinct, and in some ways, competing strategies: planning with Dynamic Programming verses sample-based learning via TD methods. This week we unify these two strategies with the Dyna architecture. You will learn how to estimate the model from data and then use this model to generate hypothetical experience (a bit like dreaming) to dramatically improve sample efficiency compared to sample-based methods like Q-learning. In addition, you will learn how to design learning systems that are robust to inaccurate models.

Taught by

Martha White and Adam White

Reviews

4.6 rating, based on 41 Class Central reviews

4.8 rating at Coursera based on 1217 ratings

Start your review of Sample-based Learning Methods

Anonymous

This is somewhat enjoyable class, but could be much, much better. First, as other people noted, the programming content is not ideal. The course would have benefited from including an introduction to RLGlue and also if possible, a brief survey of…

This is somewhat enjoyable class, but could be much, much better.

First, as other people noted, the programming content is not ideal. The course would have benefited from including an introduction to RLGlue and also if possible, a brief survey of Python/R libraries used to set up RL. This would be useful for everyone as I assume people take this class with practical intentions. Then, there is an art of passing the autograder, whose output is cryptic.

My bigger grudge is that the lectures are but a cute addendum to the book, which in my view violates the principle of Coursersa, or at least it this is not what I personally expect from Coursera. Reading books carefully and attentively takes time, and I am taking Coursera classes to speed up the process. A good video lecture can condense an hour of reading the book into 10 minutes. The authors say it themselves: read the book before lectures, and you will understand our lectures better. I spent years teaching math at a big US state university and my opinion is the opposite: make lectures of such a good quality that you can then refer to a book for additional material. Book should be for those who want to go more slowly and more thoroughly over the same material.

The lecturers (as charming a couple as they are) follow the book very, very closely, seemingly just reciting it section by section in a fairly monotonous fashion. This feels very student-ey, so they come across as a couple of students who got an A-grade last year and now teach the one year younger crowd. There are occasional extra diagrams in these lectures that are indeed helpful, but you feel that they won't or can't go deeper than the book. In short, you do not feel the caliber of instructors. Big people generally have "vision" and can point to "the road ahead".

The book is good though, and large part of what makes this class decent is the fact that lectures follow the good book so closely.

Another very good thing is that there are some videos with interviews with experts, and this is interesting and fun.

Overall, taking this class will still be a worthwhile investment of your time. At the same time I hope the second iteration of this class would have more polish, as this is not the reference reinforcement learning online class yet.
Anonymous

Overall, it is a very good course. The professors do all they can to keep it simple by using examples and, in my opinion, it works. The only issue is the auto-grader that doesn't always work perfectly and may slow down the examination.
Stewart Adamson

To be brief, this is a great course on Reinforcement Learning (RL) and I thoroughly recommend it. This is the second course in the four course Reinforcement Learning specialization from the Alberta Machine Intelligence Institute (AMII) at University…

To be brief, this is a great course on Reinforcement Learning (RL) and I thoroughly recommend it. This is the second course in the four course Reinforcement Learning specialization from the Alberta Machine Intelligence Institute (AMII) at University of Alberta. The course builds upon the knowledge and skills gained from the first course, (Fundamentals of Reinforcement Learning), and unless you are already very familiar with this field then you should definitely take that course first. AMII is the "home" of Rich Sutton and Andy Barto the authors of Reinforcement Learning an Introduction which is the standard text on RL and is the basis for all the courses in the specialization, (it is available as a free PDF as part of the course material). Sutton & Barto 2018 is also used by Stanford and DeepMind in their RL courses. As with the first course, you get to implement RL algorithms in Jupyter notebooks in Python as weekly programming assignments, so you emerge with practical knowledge at the end of the course. You can check out the syllabus on Coursera.org for details of this course and the other courses in the specialization.
James Singleton

Highly rate this course, its the second one in the series and I also completed the first. Its content is deep yet accessible. There have been some "wow" moments for me in the course work, where I have been astounded by what RL can achieve. The Dyna Q + algorithm at the end, for navigating a maze which changes over time, is very impressive. Its not just the RL theory, but the quality video of the lecturers and their guests is very strong as well. I'd recommend to anyone with an interest to get off the well trodden track of supervised learning.
Anonymous

A great course with excellent visualizations in lectures and detailed programming assignments which not least gives hands-on RL coding experience, but also teach intricacies of the material.
Giving 4 stars because lectures are short and not comprehensive enough. Lectures have the purpose of fine tuning one's understanding of the material from the book and other sources. Definitely not for beginners in RL or ML, and REQUIRE SOME PREVIOUS EXPOSURE.
Anonymous

Overall the course seem to me very well structured and the videous help you to understand the book content. The only drawback for which I gave only 4 stars out of 5 is the submission limit of programming assignments. You can submit the assignment only 5 times afterwards you are blocked for 4 months. This does not feel right, you should be able to submit it as many times as you need, until you are successful, since you pay for the course. Only in this way you would be encouraged to think at more and more possible solutions
Anonymous

1) Material is highly relevant
2) Programming assignments are unmanageable - even though you manage to create code that passes unit tests, the grader that attempts to evaluate graphs generated during experiments works as black magic, and if your submission is rejected - you'll never know why
3) regarding the lectures - personally for me they can be replaced by reading relevant chapters of RL book
Anonymous

I enjoyed from explanations and more from useful and basical assagnments
And quize that had made me deepunderstand of reinforcement topics
Anonymous

Great pointers to the reading materials, amazing animations in the short (but still enough)videos so that you can verify your understanding from the read content.

Assignments are fast to work on and at the same time they give you complete gist of the implementation that you work on. They have already coded the testing and visualization part and they only expect you to understand the algorithm so that you parts fill the important empty functions of the algorithm.
Anonymous

It is very comprehensive course based on Sutton's book. Programming exercises are good and interesting, with a lot of visualisations (based on examples from the book).

The subject is complex and my advice would be to put some notes from generalisation and summary chapters at the beginning in order to have better anticipation of the course.

There were some animations that was extremely helpful for understanding. This not at all easy topic to teach.
Anonymous

I really have been enjoying the classes in this specialization. I have had the Sutton and Barto book for years but was never as engaged with learning the material as I have been with this class. I find the videos insightful and the programming exercises really make you think through the equations/algorithms
Anonymous

The course is fairly well detailed and contains a good deal of topics. If I have any complaints, it would be that the lectures could be a bit longer and dive into topics a little more. To be fair to the course, they do give you a weekly reading list…

The course is fairly well detailed and contains a good deal of topics. If I have any complaints, it would be that the lectures could be a bit longer and dive into topics a little more. To be fair to the course, they do give you a weekly reading list from the prescribed textbook, but it would be nice if the lectures also covered the same topics with the same detail.
The programming assignments are well done and thought out, but the grader is a bit of an issue. I see a lot of other students had similar issues where the implementation is more or less correct but does not get points because it does not match the expected output exactly. While that is understandable to some extent, there is no way to debug the issue since the output matches all test cases within the notebook, just not the final test case run in the background which is treated like a black box, leaving you with no way to debug. Hopefully they get that fixed soon. But in the meanwhile, the discussion forums do have some great content to help debug.
Anonymous

Great course. The explanations are to the point, the exercises take care of the irrelevant code and only let you do the important stuff. And I like the guest lectures that are sprinkled throughout this specialization!
Anonymous

4 stars because coding the course assignments (some) require more inversion than provide in the lectures.

You have to have previous knowledge of the context or dig deeper.

Overall I am happy and recommend it.
Anonymous

The course is carefully organised and really pedagogical. The programming assignments is also well designed and it is really interesting to see how the algorithms works in practice.
Anonymous

The readings and videos are very informative and are easy to learn from. However, the quizzes and programming assignments can be a pain sometimes.
Anonymous

This course is absolutely worth the time and effort. The instructors are really great and the programming assignments help a lot.
Anonymous

The subject matter is fully fledged and lays a strong foundation in holistic nature. The organization of the course is thought through and that makes the learner to have a clean understanding with less confusions. This course has materials from assignments and quizzes to discussions and videos from industrial experts, everything is perfectly linked and that makes it very interesting to explore. Thank you very much for this course. I loved every minute of this and looking forward to the rest of the courses in the RL specialization. Thank you.
Anonymous

Enjoyed the course, really getting a sense of the breadth of the field and the potential applications, and enough knowledge to start using in the real world.

In my opinion the programming assignments were useful, and struck a good balance of making you think about how to implement the key concepts, without having to waste a lot of time programming all the other stuff. Found them really helpful in cementing my understanding of the material.

Enjoyed it enough, that I plan to finish the specialization.
Anonymous

Really a great a course and great specialization. I like the structure of the videos where they start with what u will learn in the video and end with summarizing what u have learned in the course. In general, The course is a great combination of theoritical RL (Math and so) and practical RL (where u get to use RL framework to impelement agents and environments). The topics are easily described and presented. In almost all cases u won't need to revise an article or another video to understand the topic.

Go to class

Udemy, Coursera, 2U/edX Face Lawsuits Over Meta Pixel Use

Most common

Popular subjects

Popular courses

Sample-based Learning Methods

Overview

Syllabus

Taught by

Tags

Reviews

Udemy, Coursera, 2U/edX Face Lawsuits Over Meta Pixel Use

Taught by

Tags

Decision Making and Reinforcement Learning

Prediction and Control with Function Approximation

Deep Reinforcement Learning

Reinforcement Learning

Fundamentals of Reinforcement Learning

Q-Learning - Model Free Reinforcement Learning and Temporal Difference Learning

50+ Free Online Courses and Webinars on Artificial Intelligence in Healthcare

10 Best Artificial Intelligence Courses

10 Best Applied AI & ML Courses

1700 Coursera Courses That Are Still Completely FREE

250 Top FREE Coursera Courses of All Time

Massive List of MOOC-based Microcredentials

Never Stop Learning.