What does it really mean when GPT-5 “thinks”? In this conversation, OpenAI’s VP of Research Jerry Tworek explains how modern reasoning models work in practice—why pretraining and reinforcement learning (RL/RLHF) are both essential, what that on-screen “thinking” actually does, and when extra test-time compute helps (or doesn’t). We trace the evolution from O1 (a tech demo good at puzzles) to O3 (the tool-use shift) to GPT-5 (Jerry calls it “03.1-ish”), and talk through verifiers, reward design, and the real trade-offs behind “auto” reasoning modes.
We also go inside OpenAI: how research is organized, why collaboration is unusually transparent, and how the company ships fast without losing rigor. Jerry shares the backstory on competitive-programming results like ICPC, what they signal (and what they don’t), and where agents and tool use are genuinely useful today. Finally, we zoom out: could pretraining + RL be the path to AGI?
This is the MAD Podcast —AI for the 99%. If you’re curious about how these systems actually work (without needing a PhD), this episode is your map to the current AI frontier.
OpenAI
Website – https://openai.com
X/Twitter – https://x.com/OpenAI
Jerry Tworek
LinkedIn – https://www.linkedin.com/in/jerry-tworek-b5b9aa56
X/Twitter – https://x.com/millionint
FIRSTMARK
Website – https://firstmark.com
X/Twitter – https://twitter.com/FirstMarkCap
Matt Turck (Managing Director)
LinkedIn – https://www.linkedin.com/in/turck/
X/Twitter – https://twitter.com/mattturck
LISTEN ON:
Spotify – https://open.spotify.com/show/7yLATDSaFvgJG80ACcRJtq
Apple – https://podcasts.apple.com/us/podcast/the-mad-podcast-with-matt-turck/id1686238724
00:00 – Intro
01:01 – What Reasoning Actually Means in AI
02:32 – Chain of Thought: Models Thinking in Words
05:25 – How Models Decide Thinking Time
07:24 – Evolution from O1 to O3 to GPT-5
11:00 – Before OpenAI: Growing up in Poland, Dropping out of School, Trading
20:32 – Working on Robotics and Rubik’s Cube Solving
23:02 – A Day in the Life: Talking to Researchers
24:06 – How Research Priorities Are Determined
26:53 – Collaboration vs IP Protection at OpenAI
29:32 – Shipping Fast While Doing Deep Research
31:52 – Using OpenAI’s Own Tools Daily
32:43 – Pre-Training Plus RL: The Modern AI Stack
35:10 – Reinforcement Learning 101: Training Dogs
40:17 – The Evolution of Deep Reinforcement Learning
42:09 – When GPT-4 Seemed Underwhelming at First
45:39 – How RLHF Made GPT-4 Actually Useful
48:02 – Unsupervised vs Supervised Learning
49:59 – GRPO and How DeepSeek Accelerated US Research
53:05 – What It Takes to Scale Reinforcement Learning
55:36 – Agentic AI and Long-Horizon Thinking
59:19 – Alignment as an RL Problem
1:01:11 – Winning ICPC World Finals Without Specific Training
1:05:53 – Applying RL Beyond Math and Coding
1:09:15 – The Path from Here to AGI
1:12:23 – Pure RL vs Language Models










