Collections
Discover the best community collections!
Collections including paper arxiv:2603.28032
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 133 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 176
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization
Paper • 2603.19835 • Published • 338 -
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence
Paper • 2603.28032 • Published • 340 -
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward
Paper • 2603.26599 • Published • 64
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66
-
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization
Paper • 2603.19835 • Published • 338 -
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence
Paper • 2603.28032 • Published • 340 -
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward
Paper • 2603.26599 • Published • 64
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 133 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 176
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22