Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models Paper • 2512.01949 • Published 7 days ago • 8
Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization Paper • 2511.22586 • Published 11 days ago • 6
Monet: Reasoning in Latent Visual Space Beyond Images and Language Paper • 2511.21395 • Published 12 days ago • 15
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published 14 days ago • 26
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models Paper • 2511.14582 • Published 20 days ago • 17
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published 26 days ago • 68
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published 21 days ago • 132
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data Paper • 2511.12609 • Published 22 days ago • 102
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published 25 days ago • 93
Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs Paper • 2511.05933 • Published about 1 month ago • 7
Music Flamingo: Scaling Music Understanding in Audio Language Models Paper • 2511.10289 • Published 25 days ago • 10