THE FUTURE IS HERE

How OpenAI made o1 "think" – Here is what we think and already know about o1 reinforcement learning

Here is what we think about the training procedure of OpenAI o1. We speculate based on all the bread crumbs we could find, how exactly reinforcement learning (RL) helped train the model to “think” by producing private Chain-of-Thought tokens before answering.
This video as a blog post: 📖 https://aicoffeebreakwl.substack.com/p/how-openai-made-o1-think

AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/

📚 OpenAI o1 technical research post: https://openai.com/index/learning-to-reason-with-llms/
📃 Let’s verify paper: Lightman, Hunter, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. “Let’s verify step by step.”(2023) https://arxiv.org/abs/2305.20050
🐦 Subbarao Kambhampati’s tweet: https://x.com/rao2z/status/1834354533931385203

Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael, Sunny Dhiana, Andy Ma

Outline:
00:00 New model from OpenAI
01:35 What “Thinking” means
02:12 How o1 works
03:15 Training OpenAI o1
06:55 Inference-time CoT
07:38 How good is o1?
08:31 How good is it really?

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
Join this channel as a Bean Member to get access to perks:
https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA/join
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter / X: https://twitter.com/AICoffeeBreak
LinkedIn: https://www.linkedin.com/in/letitia-parcalabescu/
Threads: https://www.threads.net/@ai.coffee.break
Bluesky: https://bsky.app/profile/aicoffeebreak.bsky.social
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak
Substack: https://aicoffeebreakwl.substack.com/

#o1 #AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​

Music 🎵 : Just Breathing (Instrumental) – NEFFEX