RL's Razor: Why Online Reinforcement Learning Forgets Less (Sep 2025)

Title: RL's Razor: Why Online Reinforcement Learning Forgets Less (Sep 2025)
Link: http://arxiv.org/abs/2509.04259v1
Date: September 2025

Summary:
This paper compares fine-tuning models with reinforcement learning (RL) and supervised fine-tuning (SFT). It reveals that RL preserves prior knowledge better despite similar new task performance. The degree of forgetting is determined by the KL-divergence between fine-tuned and base policies. RL is implicitly biased towards KL-minimal solutions, unlike SFT. This is validated through experiments with large language models and robotic foundation models. The principle is termed RL's Razor.

Key Topics:
- Reinforcement Learning (RL)
- Supervised Fine-Tuning (SFT)
- Catastrophic Forgetting
- KL Divergence
- Online Learning
- Foundation Models
- Policy Gradient Methods

Chapters:
00:00 - Intro to Catastrophic Forgetting
00:05 - AI Catastrophic Forgetting Problem
00:19 - Catastrophic Forgetting Definition
00:36 - RL's Razor: The Paper
00:52 - RL vs SFT
01:10 - RL's Razor Discovery
01:27 - Learning New Tricks
01:38 - Memory Retention
01:53 - Long-lived Adaptable Agents
02:18 - Static vs Adaptable Models
02:38 - Catastrophic Forgetting Explained
03:17 - Focus on SFT and RL
03:42 - Core Empirical Finding
04:05 - Pareto Frontiers
04:46 - Tested on Actual Models
05:25 - Related Skills
05:40 - Why is RL Better?
06:18 - Systematic Approach
06:46 - Empirical Forgetting Law
07:04 - KL Divergence Defined
07:37 - Shifting Perspectives
07:46 - Parody MNIST
08:21 - KL Diversions Connection
09:02 - Oracle SFT
09:39 - RL's Implicit Tendency
09:58 - Cause or Correlation?
10:16 - Training Objectives
10:39 - Target Outputs
11:06 - Critical Distinctions
11:36 - Negative Feedback Experiments
12:16 - Clear Cut Results
12:46 - On-Policy Sampling
13:25 - Theoretical Justification
14:06 - Landscape Leap
14:45 - Minimal Projection
15:05 - Ruling Things Out
15:35 - Weight Changes
16:14 - Representation
16:58 - Consequence of RL
17:16 - Sparsity or Lower Rank?
17:53 - Alternative Distances
18:37 - New Way to Think
19:07 - New Design Axis
19:35 - Actionable Principle
19:50 - Learning for Life
20:14 - Open Questions
20:47 - Scaling Questions
21:10 - Off Policy Methods
21:39 - Critical New Perspective
22:00 - Recap: RL's Razor
22:43 - Provocative Thought

Title: RL’s Razor: Why Online Reinforcement Learning Forgets Less (Sep 2025)
Link: http://arxiv.org/abs/2509.04259v1
Date: September 2025

Key Topics:
– Reinforcement Learning (RL)
– Supervised Fine-Tuning (SFT)
– Catastrophic Forgetting
– KL Divergence
– Online Learning
– Foundation Models
– Policy Gradient Methods

Chapters:
00:00 – Intro to Catastrophic Forgetting
00:05 – AI Catastrophic Forgetting Problem
00:19 – Catastrophic Forgetting Definition
00:36 – RL’s Razor: The Paper
00:52 – RL vs SFT
01:10 – RL’s Razor Discovery
01:27 – Learning New Tricks
01:38 – Memory Retention
01:53 – Long-lived Adaptable Agents
02:18 – Static vs Adaptable Models
02:38 – Catastrophic Forgetting Explained
03:17 – Focus on SFT and RL
03:42 – Core Empirical Finding
04:05 – Pareto Frontiers
04:46 – Tested on Actual Models
05:25 – Related Skills
05:40 – Why is RL Better?
06:18 – Systematic Approach
06:46 – Empirical Forgetting Law
07:04 – KL Divergence Defined
07:37 – Shifting Perspectives
07:46 – Parody MNIST
08:21 – KL Diversions Connection
09:02 – Oracle SFT
09:39 – RL’s Implicit Tendency
09:58 – Cause or Correlation?
10:16 – Training Objectives
10:39 – Target Outputs
11:06 – Critical Distinctions
11:36 – Negative Feedback Experiments
12:16 – Clear Cut Results
12:46 – On-Policy Sampling
13:25 – Theoretical Justification
14:06 – Landscape Leap
14:45 – Minimal Projection
15:05 – Ruling Things Out
15:35 – Weight Changes
16:14 – Representation
16:58 – Consequence of RL
17:16 – Sparsity or Lower Rank?
17:53 – Alternative Distances
18:37 – New Way to Think
19:07 – New Design Axis
19:35 – Actionable Principle
19:50 – Learning for Life
20:14 – Open Questions
20:47 – Scaling Questions
21:10 – Off Policy Methods
21:39 – Critical New Perspective
22:00 – Recap: RL’s Razor
22:43 – Provocative Thought

THE FUTURE IS HERE

AI Now

AI in Human Resources and Talent Management

What is transfer learning? #ai #machinelearning #neuralnetworks

How to Transfer WhatsApp Chats to New Phone (Without Losing Anything) 📱💬 #whatsapp #chats #transfer

What Are The Benefits Of AI Predictive Analytics For Legal And HR? – Legal And HR SaaS Stack

Will HR Be Replaced by AI? | The Future of Human Resources Explained

The Future of Investing

What Are The Secret Algorithms Behind AI Predictive Sourcing? – Legal And HR SaaS Stack

How Does AI Predictive Analytics Transform HR Data? – Office Wellness Insights

Why Is AI Crucial For Predictive Customer Journey Mapping Success? – Call Center Pro Strategies

The Best Predictive AI Software for Customer Service Agents and Customers (2025)