π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 4 days ago • 89
Stabilizing Rubric Integration Training via Decoupled Advantage Normalization Paper • 2603.26535 • Published Mar 27 • 3
Stabilizing Rubric Integration Training via Decoupled Advantage Normalization Paper • 2603.26535 • Published Mar 27 • 3
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning Paper • 2509.25300 • Published Sep 29, 2025 • 8
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning Paper • 2509.25300 • Published Sep 29, 2025 • 8
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 238
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 238