hsvgbkhgbv 's Collections LLM papers
updated
Low-probability Tokens Sustain Exploration in Reinforcement Learning
with Verifiable Reward
Paper
• 2510.03222
• Published
• 75
In-the-Flow Agentic System Optimization for Effective Planning and Tool
Use
Paper
• 2510.05592
• Published
• 107
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published
• 509
Multi-Agent Tool-Integrated Policy Optimization
Paper
• 2510.04678
• Published
• 31
EPO: Entropy-regularized Policy Optimization for LLM Agents
Reinforcement Learning
Paper
• 2509.22576
• Published
• 135
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published
• 190
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making
through Multi-Turn Reinforcement Learning
Paper
• 2509.08755
• Published
• 57
GEM: A Gym for Agentic LLMs
Paper
• 2510.01051
• Published
• 90
Agentic Entropy-Balanced Policy Optimization
Paper
• 2510.14545
• Published
• 106
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Paper
• 2510.09577
• Published
• 8
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Paper
• 2512.15687
• Published
• 21
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Paper
• 2512.13607
• Published
• 36
Paper
• 2512.16301
• Published
• 106
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Paper
• 2512.13874
• Published
• 17
Recursive Language Models
Paper
• 2512.24601
• Published
• 89
Token-Level LLM Collaboration via FusionRoute
Paper
• 2601.05106
• Published
• 40
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
Paper
• 2601.09667
• Published
• 91
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
Paper
• 2601.15165
• Published
• 72
Behavior Knowledge Merge in Reinforced Agentic Models
Paper
• 2601.13572
• Published
• 24
Learning to Discover at Test Time
Paper
• 2601.16175
• Published
• 42
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper
• 2601.16443
• Published
• 17
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
• 2601.18778
• Published
• 40
Linear representations in language models can change dramatically over a conversation
Paper
• 2601.20834
• Published
• 21
Self-Distillation Enables Continual Learning
Paper
• 2601.19897
• Published
• 26
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System
Paper
• 2602.02488
• Published
• 32
Self-Hinting Language Models Enhance Reinforcement Learning
Paper
• 2602.03143
• Published
• 29
Experiential Reinforcement Learning
Paper
• 2602.13949
• Published
• 67
Multi-agent cooperation through in-context co-player inference
Paper
• 2602.16301
• Published
• 23