LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published 15 days ago • 150
RynnVLA-002: A Unified Vision-Language-Action and World Model Paper • 2511.17502 • Published 19 days ago • 24
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published 21 days ago • 91
Large Language Models Do NOT Really Know What They Don't Know Paper • 2510.09033 • Published Oct 10 • 16
Large Language Models Do NOT Really Know What They Don't Know Paper • 2510.09033 • Published Oct 10 • 16
Large Language Models Do NOT Really Know What They Don't Know Paper • 2510.09033 • Published Oct 10 • 16 • 2
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published Oct 13 • 100
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published Oct 13 • 100 • 4
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published Oct 13 • 100
GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness Paper • 2510.00536 • Published Oct 1 • 6
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Paper • 2509.26625 • Published Sep 30 • 43
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25 • 103
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning Paper • 2509.17437 • Published Sep 22 • 17