Reinforcement Learning with LLMs: a new era of AI agents
🤝 Work with me: https://www.shawhintalebi.com
🚀 Ship AI apps in weeks, not months: https://aibuilder.academy/yt/slJqu3N16Xc
This is the 2nd video in a larger series on reinforcement learning (RL) with LLMs. Here, I discuss 3 ways people are using RL to train modern LLMs and AI agents.
▶️ Series Playlist: https://www.youtube.com/playlist?list=PLz-ep5RbHosU_UY8NtZAMaraz74sMHo2W
References
[1] https://youtu.be/3vFISl7qMFI
[2] arXiv:2203.02155 [cs.CL]
[3] https://youtu.be/6yIMb0K-aS4
[4] https://youtu.be/uaZ3yRdYg8A
[5] arXiv:2509.16679 [cs.CL]
[6] arXiv:2509.04501 [cs.CL]
[7] https://youtu.be/7xTGNNLPyMI
[8] arXiv:2212.08073 [cs.CL]
[9] arXiv:2501.12948 [cs.CL]
[10] https://youtu.be/gEDl9C8s_-4
Introduction - 0:00
Reinforcement Learning (RL) - 0:17
RL with LLMs - 1:29
How LLMs are Trained - 3:25
3 Ways to RL with LLMs - 6:15
Way 1: RLHF - 6:43
Way 2: RLAIF - 9:40
Way 3: RLVR - 13:28
Limitations - 18:19
What's Next? - 19:58
🤝 Work with me: https://www.shawhintalebi.com
🚀 Ship AI apps in weeks, not months: https://aibuilder.academy/yt/slJqu3N16Xc
This is the 2nd video in a larger series on reinforcement learning (RL) with LLMs. Here, I discuss 3 ways people are using RL to train modern LLMs and AI agents.
▶️ Series Playlist: https://www.youtube.com/playlist?list=PLz-ep5RbHosU_UY8NtZAMaraz74sMHo2W
References
[1] https://youtu.be/3vFISl7qMFI
[2] arXiv:2203.02155 [cs.CL]
[3] https://youtu.be/6yIMb0K-aS4
[4] https://youtu.be/uaZ3yRdYg8A
[5] arXiv:2509.16679 [cs.CL]
[6] arXiv:2509.04501 [cs.CL]
[7] https://youtu.be/7xTGNNLPyMI
[8] arXiv:2212.08073 [cs.CL]
[9] arXiv:2501.12948 [cs.CL]
[10] https://youtu.be/gEDl9C8s_-4
Introduction – 0:00
Reinforcement Learning (RL) – 0:17
RL with LLMs – 1:29
How LLMs are Trained – 3:25
3 Ways to RL with LLMs – 6:15
Way 1: RLHF – 6:43
Way 2: RLAIF – 9:40
Way 3: RLVR – 13:28
Limitations – 18:19
What’s Next? – 19:58