hsvgbkhgbv
's Collections
LLM papers
updated
Low-probability Tokens Sustain Exploration in Reinforcement Learning
with Verifiable Reward
Paper
•
2510.03222
•
Published
•
75
In-the-Flow Agentic System Optimization for Effective Planning and Tool
Use
Paper
•
2510.05592
•
Published
•
106
Less is More: Recursive Reasoning with Tiny Networks
Paper
•
2510.04871
•
Published
•
501
Multi-Agent Tool-Integrated Policy Optimization
Paper
•
2510.04678
•
Published
•
30
EPO: Entropy-regularized Policy Optimization for LLM Agents
Reinforcement Learning
Paper
•
2509.22576
•
Published
•
134
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
•
2509.08827
•
Published
•
190
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making
through Multi-Turn Reinforcement Learning
Paper
•
2509.08755
•
Published
•
56
GEM: A Gym for Agentic LLMs
Paper
•
2510.01051
•
Published
•
89
Agentic Entropy-Balanced Policy Optimization
Paper
•
2510.14545
•
Published
•
104
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Paper
•
2510.09577
•
Published
•
7
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Paper
•
2512.15687
•
Published
•
17
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Paper
•
2512.13607
•
Published
•
27
Paper
•
2512.16301
•
Published
•
98
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Paper
•
2512.13874
•
Published
•
16