Reinforcement Learning from Human Feedback: From Zero to chatGPT

In this talk, we will cover the basics of Reinforcement Learning from Human Feedback (RLHF) and how this technology is being used to enable state-of-the-art ML tools like ChatGPT. Most of the talk will be an overview of the interconnected ML models and cover the basics of Natural Language Processing and RL that one needs to understand how RLHF is used on large language models. It will conclude with open question in RLHF.

RLHF Blogpost: https://huggingface.co/blog/rlhf
The Deep RL Course: https://hf.co/deep-rl-course
Slides from this talk: https://docs.google.com/presentation/d/1eI9PqRJTCFOIVihkig1voRM4MHDpLpCicX9lX1J2fqk/edit?usp=sharing
Nathan Twitter: https://twitter.com/natolambert
Thomas Twitter: https://twitter.com/thomassimonini

Nathan Lambert is a Research Scientist at HuggingFace. He received his PhD from the University of California, Berkeley working at the intersection of machine learning and robotics. He was advised by Professor Kristofer Pister in the Berkeley Autonomous Microsystems Lab and Roberto Calandra at Meta AI Research. He was lucky to intern at Facebook AI and DeepMind during his Ph.D. Nathan was was awarded the UC Berkeley EECS Demetri Angelakos Memorial Achievement Award for Altruism for his efforts to better community norms.

THE FUTURE IS HERE

AI Now

The Biggest Risks Of Using AI In Education

Thinking in an AI-Augmented World | Askwith Education Forum

AI in Education: Panel Discussion: How Does AI Affect How We Learn?

Human augmentation and digital technology's impact on humans

“What If We Make SUPER Humans?” – AI Gene Editing Future SPARKS Human Enhancement CONTROVERSY

What is AI risk management? – AIGP Certification (2026)

AI in Risk Management and Governance

How AI is Revolutionizing Talent Management | Latest Technology Updates

The Impact of AI on Talent Management and Performance Improvement

AI for Talent Management and AI Upskilling for small-to-medium Businesses

Reinforcement Learning from Human Feedback: From Zero to chatGPT

Reinforcement Learning from Human Feedback: From Zero to chatGPT

Rich X Search