From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks Paper • 2512.02580 • Published 7 days ago • 27
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13 • 176
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe Paper • 2509.18154 • Published Sep 16 • 51
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities Paper • 2505.15692 • Published May 21 • 14