Reinforcement Learning with Neural Networks: Mathematical Details
Here we go through the math required to update a parameter in a neural network using reinforcement learning and we do it one step a time. We show how the derivatives are calculated (BAM!), then updated (DOUBLE BAM!!), and then used to optimize the parameters (TRIPLE BAM!!!).
If you'd like to support StatQuest, please consider...
Patreon: https://www.patreon.com/statquest
...or...
YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join
...buying a book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
https://statquest.org/statquest-store/
...or just donating to StatQuest!
paypal: https://www.paypal.me/statquest
venmo: @JoshStarmer
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
https://twitter.com/joshuastarmer
0:00 Awesome song and introduction
4:09 Calculating a derivative
12:16 Updating the derivative with a reward
15:39 Updating a parameter in the neural network
16:28 A second example
#StatQuest
Here we go through the math required to update a parameter in a neural network using reinforcement learning and we do it one step a time. We show how the derivatives are calculated (BAM!), then updated (DOUBLE BAM!!), and then used to optimize the parameters (TRIPLE BAM!!!).
If you’d like to support StatQuest, please consider…
Patreon: https://www.patreon.com/statquest
…or…
YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join
…buying a book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store…
https://statquest.org/statquest-store/
…or just donating to StatQuest!
paypal: https://www.paypal.me/statquest
venmo: @JoshStarmer
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
https://twitter.com/joshuastarmer
0:00 Awesome song and introduction
4:09 Calculating a derivative
12:16 Updating the derivative with a reward
15:39 Updating a parameter in the neural network
16:28 A second example
#StatQuest