THE FUTURE IS HERE

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

Part three of a six part series on Reinforcement Learning. It covers the Monte Carlo approach a Markov Decision Process with mere samples. At the end, we touch on off-policy methods, which enable RL when the data was generate with a different agent.

SOCIAL MEDIA

Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: https://www.patreon.com/MutualInformation

Twitter : https://twitter.com/DuaneJRich

SOURCES

[1] R. Sutton and A. Barto. Reinforcement learning: An Introduction (2nd Ed). MIT Press, 2018.

[2] H. Hasselt, et al. RL Lecture Series, Deepmind and UCL, 2021, https://www.youtube.com/playlist?list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm

SOURCE NOTES

The video covers topics from chapters 5 and 7 from [1]. The whole series teaches from [1]. [2] has been a useful secondary resource.

TIMESTAMP
0:00 What We’ll Learn
0:33 Review of Previous Topics
2:50 Monte Carlo Methods
3:35 Model-Free vs Model-Based Methods
4:59 Monte Carlo Evaluation
9:30 MC Evaluation Example
11:48 MC Control
13:01 The Exploration-Exploitation Trade-Off
15:01 The Rules of Blackjack and its MDP
16:55 Constant-alpha MC Applied to Blackjack
21:55 Off-Policy Methods
24:32 Off-Policy Blackjack
26:43 Watch the next video!

NOTES

Link to Constant-alpha MC applied to Blackjack: https://github.com/Duane321/mutual_information/tree/main/videos/monte_carlo_for_RL_and_off_policy_methods

The Off-Policy method you see at 25:00 is different from the rule you’ll see in the textbook at eq 7.9 (which will be MC if n goes to inf). That’s because they are showing re-weighted IS and I’m showing plain ( high variance) IS.