MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 136
From Uniform to Heterogeneous: Tailoring Policy Optimization to Every Token's Nature Paper • 2509.16591 • Published Sep 20 • 2
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding Paper • 2504.09925 • Published Apr 14 • 38
SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models Paper • 2407.20756 • Published Jul 30, 2024 • 1