-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2503.23461
-
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Paper • 2504.00999 • Published • 96 -
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
Paper • 2503.24379 • Published • 76 -
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1
Paper • 2503.24376 • Published • 38 -
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Paper • 2503.21614 • Published • 43
-
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
Paper • 2502.18461 • Published • 17 -
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Paper • 2410.10792 • Published • 31 -
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation
Paper • 2503.13070 • Published • 10 -
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
Paper • 2503.12885 • Published • 43
-
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Paper • 2501.02955 • Published • 44 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 109 -
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Paper • 2501.12380 • Published • 84 -
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Paper • 2501.09781 • Published • 27
-
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance
Paper • 2412.05355 • Published • 8 -
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion
Paper • 2412.04301 • Published • 40 -
PanoDreamer: 3D Panorama Synthesis from a Single Image
Paper • 2412.04827 • Published • 10 -
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
Paper • 2412.06781 • Published • 23
-
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
Paper • 2503.18446 • Published • 12 -
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Paper • 2503.20240 • Published • 22 -
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Paper • 2503.20672 • Published • 14 -
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Paper • 2503.20198 • Published • 4
-
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Paper • 2412.09619 • Published • 30 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 28
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Paper • 2504.00999 • Published • 96 -
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
Paper • 2503.24379 • Published • 76 -
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1
Paper • 2503.24376 • Published • 38 -
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Paper • 2503.21614 • Published • 43
-
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
Paper • 2503.18446 • Published • 12 -
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Paper • 2503.20240 • Published • 22 -
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Paper • 2503.20672 • Published • 14 -
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Paper • 2503.20198 • Published • 4
-
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
Paper • 2502.18461 • Published • 17 -
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Paper • 2410.10792 • Published • 31 -
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation
Paper • 2503.13070 • Published • 10 -
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
Paper • 2503.12885 • Published • 43
-
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Paper • 2501.02955 • Published • 44 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 109 -
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Paper • 2501.12380 • Published • 84 -
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Paper • 2501.09781 • Published • 27
-
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Paper • 2412.09619 • Published • 30 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 28
-
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance
Paper • 2412.05355 • Published • 8 -
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion
Paper • 2412.04301 • Published • 40 -
PanoDreamer: 3D Panorama Synthesis from a Single Image
Paper • 2412.04827 • Published • 10 -
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
Paper • 2412.06781 • Published • 23
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22