MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Paper • 2408.10605 • Published • 2
Computer Vision
RIVER: A Real-Time Interaction Benchmark for Video LLMs
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision