THE FUTURE IS HERE

Policy Gradient Theorem Explained – Reinforcement Learning

In this video, I explain the policy gradient theorem used in reinforcement learning (RL). Instead of showing the typical mathematical derivation of the proof, I explain the resulting formula by walking through an example of playing a game and figuring out how we can estimate the policy gradient of the expected return by sampling episodes from the environment. I also show some graph visualizations that give an intuition for how the partial derivatives with respect to the action probabilities are backpropagated to get the correct policy gradient within the limited action space (where all probabilities have to sum to 1). I also explain how we can use the log probabilities instead of the direct probabilities (the log-derivative trick) for improved computational efficiency. I also walk through some pseudocode (Python / PyTorch inspired) of the derived policy gradient algorithm, which is a variant of the REINFORCE algorithm. And I show how we can reduce the variance by normalizing the future returns and dividing by the number of steps instead of the number of episodes.

Policy gradient methods are used in many of the current state-of-the-art reinforcement learning algorithms, and I think it is likely that policy gradient methods will play be an important role in advancing the field of RL. I’m excited to continue exploring this field and sharing what I learn along the way.

Join our Discord community:
💬 https://discord.gg/cdQhRgw

Connect with me:
🐦 Twitter – https://twitter.com/elliotwaite
📷 Instagram – https://www.instagram.com/elliotwaite
👱 Facebook – https://www.facebook.com/elliotwaite
💼 LinkedIn – https://www.linkedin.com/in/elliotwaite

🎵 Kazukii – Return
→ https://soundcloud.com/ohthatkazuki
→ https://open.spotify.com/artist/5d07MpiIaNmmEMTq79KAga
→ https://www.youtube.com/user/OfficialKazuki