LoMo: Local Modality Substitution for Deeper Vision-Language Fusion Paper • 2605.30265 • Published 5 days ago • 20
LoMo: Local Modality Substitution for Deeper Vision-Language Fusion Paper • 2605.30265 • Published 5 days ago • 20
LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation Paper • 2510.11063 • Published Oct 13, 2025 • 1
UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing Paper • 2602.02437 • Published Feb 2 • 80
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing Paper • 2602.12205 • Published Feb 12 • 83
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published 22 days ago • 46
SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction Paper • 2605.20110 • Published 14 days ago • 3
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Paper • 2508.20096 • Published Aug 27, 2025 • 37
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Paper • 2502.13128 • Published Feb 18, 2025 • 41
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Paper • 2501.01428 • Published Jan 2, 2025
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction Paper • 2507.15852 • Published Jul 21, 2025 • 38