Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published Oct 29 • 45
Memory Retrieval and Consolidation in Large Language Models through Function Tokens Paper • 2510.08203 • Published Oct 9 • 9
The Three Regimes of Offline-to-Online Reinforcement Learning Paper • 2510.01460 • Published Oct 1 • 1
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems Paper • 2510.02263 • Published Oct 2 • 8
Generalized Parallel Scaling with Interdependent Generations Paper • 2510.01143 • Published Oct 1 • 4
Mem-α: Learning Memory Construction via Reinforcement Learning Paper • 2509.25911 • Published Sep 30 • 14
Scalable Reinforcement Post-Training Beyond Static Human Prompts: Evolving Alignment via Asymmetric Self-Play Paper • 2411.00062 • Published Oct 31, 2024 • 1
Distilled Pretraining: A modern lens of Data, In-Context Learning and Test-Time Scaling Paper • 2509.01649 • Published Sep 1 • 2
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Paper • 2504.16078 • Published Apr 22 • 21
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design Paper • 2508.21184 • Published Aug 28 • 2
Reinforcement Learning for Machine Learning Engineering Agents Paper • 2509.01684 • Published Sep 1 • 1
Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First Paper • 2509.00997 • Published Aug 31 • 2
Differentiable Entropy Regularization for Geometry and Neural Networks Paper • 2509.03733 • Published Sep 3 • 1