RelayLLM: Efficient Reasoning via Collaborative Decoding Paper • 2601.05167 • Published 21 days ago • 29
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published Dec 17, 2025 • 20
MotionEdit: Benchmarking and Learning Motion-Centric Image Editing Paper • 2512.10284 • Published Dec 11, 2025 • 26
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published Oct 23, 2025 • 19
VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning Paper • 2510.01444 • Published Oct 1, 2025 • 20
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Paper • 2510.01591 • Published Oct 2, 2025 • 28
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published Sep 30, 2025 • 55
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation Paper • 2509.15194 • Published Sep 18, 2025 • 33
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models Paper • 2509.09675 • Published Sep 11, 2025 • 28
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9, 2025 • 102
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper • 2509.03403 • Published Sep 3, 2025 • 23
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27, 2025 • 84
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback Paper • 2310.11550 • Published Oct 17, 2023 • 1