Formal Languages and Automata for Reward Function Specification and Efficient Reinforcement Learning

Overview

This course aims to teach students how to use formal languages and automata for reward function specification and efficient reinforcement learning. The learning outcomes include understanding the challenges of real-world reinforcement learning, defining reward functions using reward machines, and applying Q-learning for reward machines. Students will learn skills such as constructing reward machines from formal languages, generating reward machines using a symbolic planner, and learning reward machines for partially-observable reinforcement learning. The teaching method involves a combination of theoretical concepts and practical examples. The course is intended for individuals interested in reinforcement learning, artificial intelligence, and machine learning.

Syllabus

Intro
Acknowledgements
Reinforcement Learning (RL)
Challenges of Real-World RL
Goals and Preferences
Linear Temporal Logic (LTL) A compelling logic to express temporal properties of traces.
Challenges to RL
Toy Problem Disclaimer
Running Example
Decoupling Transition and Reward Functions
The Rest of the Talk
Define a Reward Function using a Reward Machine
Reward Function Vocabulary
Simple Reward Machine
Reward Machines in Action
Other Reward Machines
Q-Learning Baseline
Option-Based Hierarchical RL (HRL)
HRL with RM-Based Pruning (HRL-RM)
HRL Methods Can Find Suboptimal Policies
Q-Learning for Reward Machines (QRM)
QRM In Action
Recall: Methods for Exploiting RM Structure
5. QRM + Reward Shaping (QRM + RS)
Test Domains
Test in Discrete Domains
Office World Experiments
Minecraft World Experiments
Function Approximation with QRM
Water World Experiments
Creating Reward Machines
Reward Specification: one size does not fit all
1. Construct Reward Machine from Formal Languages
Generate RM using a Symbolic Planner
Learn RMs for Partially-Observable RL