791: Reinforcement Learning from Human Feedback (RLHF)

791: Reinforcement Learning from Human Feedback (RLHF) — with Dr. Nathan Lambert

#ReinforcementLearning #RLHF #GenerativeAI

Reinforcement learning through human feedback (RLHF) has come a long way. In this episode, research scientist Nathan Lambert talks to @JonKrohnLearns about the technique’s origins of the technique. He also walks through other ways to fine-tune LLMs, and how he believes generative AI might democratize education.

This episode is brought to you by AWS Inferentia (https://go.aws/3zWS0au) and AWS Trainium (https://go.aws/3ycV6K0), and Crawlbase (https://crawlbase.com), the ultimate data crawling platform. Interested in sponsoring a SuperDataScience Podcast episode? Visit https://passionfroot.me/superdatascience for sponsorship information.

In this episode you will learn:
• [00:00:00] Introduction
• [00:01:52] Why it is important that AI is open
• [00:06:11] The efficacy and scalability of direct preference optimization
• [00:13:12] Robotics and LLMs
• [00:21:36] The challenges to aligning reward models with human preferences
• [00:27:35] How to make sure AI’s decision making on preferences reflect desirable behavior
• [00:36:27] Why Nathan believes AI is closer to alchemy than science

Additional materials: https://www.superdatascience.com/791

THE FUTURE IS HERE

AI Now

Top 7 Cutting-Edge U.S. Military Technologies

35 ADVANCED Technologies from the Ancient World You Need to Know | History & Innovation

Top 10 Most Advanced Technology Countries in the World 2025 #top10 #trending #mostpowerfulcountries

New invention #scienceproject #inventions #technology #tech #shorts

AI Literacy: What is Algorithmic Bias?

AI Guitar Tones? – Positive Grid Bias X

Can You Trust an AI to Judge Fairly? Exploring LLM Biases

Is AI Biased… Or Are We? #ai #letsknowthat

AI in Healthcare Series: AI, Longevity, and the Future of Healthcare, with Dr. Eric Topol

Unlock AI's Potential in Healthcare: Legal and Clinical Focus

791: Reinforcement Learning from Human Feedback (RLHF) — with Dr. Nathan Lambert

791: Reinforcement Learning from Human Feedback (RLHF) — with Dr. Nathan Lambert

Rich X Search