Policy Gradient Methods | Reinforcement Learning Part 6

Policy Gradient Methods are among the most effective techniques in Reinforcement Learning. In this video, we’ll motivate their design, observe their behavior and understand their background theory.

SOCIAL MEDIA

Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: https://www.patreon.com/MutualInformation

Twitter : https://twitter.com/DuaneJRich

SOURCES FOR THE FULL SERIES

[1] R. Sutton and A. Barto. Reinforcement learning: An Introduction (2nd Ed). MIT Press, 2018.

[2] H. Hasselt, et al. RL Lecture Series, Deepmind and UCL, 2021, https://youtu.be/TCCjZe0y4Qc

[3] J. Achiam. Spinning Up in Deep Reinforcement Learning, OpenAI, 2018

ADDITIONAL SOURCES FOR THIS VIDEO

[4] J. Achiam, Spinning Up in Deep Reinforcement Learning: Intro to Policy Optimization, OpenAI, 2018, https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html

[5] D. Silver, Lecture 7: Policy Gradient Methods, Deepmind, 2015, https://youtu.be/KHZVXao4qXs

TIMESTAMPS
0:00 Introduction
0:50 Basic Idea of Policy Gradient Methods
2:30 A Familiar Shape
4:23 Motivating the Update Rule
10:51 Fixing the Update Rule
12:55 Example: Windy Highway
16:47 A Problem with Naive PGMs
19:43 Reinforce with Baseline
21:42 The Policy Gradient Theorem
25:20 General Comments
28:02 Thanking The Sources

LINKS

Windy Highway: https://github.com/Duane321/mutual_information/tree/main/videos/policy_gradient_methods

NOTES

[1] When motivating the update rule with an animation protopoints and theta bars, I don’t specify alpha. That’s because the lengths of the gradient arrows can only be interpretted on a relative basis. Their absolute numeric values can’t be deduced from the animation because there was some unmentioned scaling done to make the animation look natural. Mentioning alpha would have make this calculation possible to attempt, so I avoided it.

THE FUTURE IS HERE

AI Now

DARPA's New 12-Ton RACER Autonomous Military Vehicle

Exploring DARPA: A Quick Guide to the Defense Advanced Research Projects Agency

Run your own AI (but private)

Data Privacy in the AI Era: Opportunities & Challenges

Google promises better privacy, smarter AI tools

Generative AI – Privacy Risks & Challenges

10 Predictions about the Future of Healthcare AI – The Medical Futurist

The promise and perils of AI in health care

Stanford Med LIVE: The State of AI in Healthcare and Medicine

Elon Musk: Regulation of AI Safety