How AI Learns to Reason with Reinforcement Learning

Reinforcement learning algorithms are the key driving force for training reasoning LLMs (e.g., DeepSeek-R1, Google’s Gemini pro, OpenAI’s o1/o3).

This video provides an overview of the key ideas of these reinforcement learning algorithms, covering the development from REINFORCE, Value function estimation, Actor-critic methods, Generalized Advantage Estimation, TRPO, PPO, and GRPO.

00:00 Introduction
00:43 Notation
02:41 Policy gradient
05:11 Decomposing trajectory into states and actions
07:05 Baseline subtraction
07:58 Value function estimation
08:31 Advantage estimation
11:11 Actor-critic methods
12:16 Trust region policy optimization
16:48 ProximalPolicyOptimization
19:55 Group Relative Policy Optimization
21:58 Dr. GRPO

=== Resources ===
Three excellent resources I found particularly useful (if you are interested in learning more).
– Foundations of Deep RL — 6-lecture series by Pieter Abbeel https://www.youtube.com/playlist?list=PLwRJQ4m4UJjNymuBM9RdmB3Z9N5-0IlY0

– DeepMind x UCL | Introduction to Reinforcement Learning by David Silver
https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ

– Reinforcement Learning: An Introduction http://www.incompleteideas.net/book/the-book-2nd.html

=== References ===
– REINFORCE https://link.springer.com/content/pdf/10.1007/BF00992696.pdf
– Actor-critic: https://arxiv.org/abs/1602.01783
– GAE: https://arxiv.org/abs/1506.02438
– TRPO: https://arxiv.org/abs/1502.05477
– PPO: https://arxiv.org/abs/1707.06347
– GRPO: https://arxiv.org/pdf/2402.03300
– DeepSeek-R1: https://arxiv.org/abs/2501.12948
– Dr. GRPO: https://arxiv.org/abs/2503.20783

THE FUTURE IS HERE

AI Now

How AI Can Read Your Emotions || The Truth Behind AI Emotion Recognition Technology

What is Big Data Analytics?

🔥 Today’s Market Movers: Big Data, Trading Flow & Bull or Bear?

Myelin-H Neurotechnology

NeurotechEU – Current Methods in Neurotechnology – Introduction

Neuphoria Headband Unboxing #golf #neurotechnology #braintraining #neuphoria#flowstate

BSc Neuroscience Technology 🧠 | Job opportunities | Malayalam | Career FrameZ

Mind-Control Technology: The Future is Here!

Biotech Certifications – Boosting your career Prospects #certification #biotechnology

Syllabus Analysis of CUET PG Plant Biotechnology 2026 – DON’T Start Without This!

How AI Learns to Reason with Reinforcement Learning

How AI Learns to Reason with Reinforcement Learning

Rich X Search