ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation Paper • 2511.01163 • Published Nov 3 • 31
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Paper • 2511.11434 • Published 27 days ago • 44
AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies Paper • 2508.08113 • Published Aug 11 • 11
From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens Paper • 2510.02292 • Published Oct 2 • 1
Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry Paper • 2510.25595 • Published Oct 29
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation Paper • 2511.01163 • Published Nov 3 • 31
From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens Paper • 2510.02292 • Published Oct 2 • 1
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation Paper • 2506.21876 • Published Jun 27 • 28