HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published Mar 17 • 109
WorldAgents: Can Foundation Image Models be Agents for 3D World Models? Paper • 2603.19708 • Published about 1 month ago • 13
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data Paper • 2603.25319 • Published 24 days ago • 32
ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions Paper • 2603.25791 • Published 24 days ago • 5
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence? Paper • 2604.03016 • Published 17 days ago • 37
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published 18 days ago • 142
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models Paper • 2604.04707 • Published 14 days ago • 200