Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11 • 49
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 98
RLPR: Extrapolating RLVR to General Domains without Verifiers Paper • 2506.18254 • Published Jun 23 • 31