video mllm
updated
VideoAgent: Long-form Video Understanding with Large Language Model as
Agent
Paper
• 2403.10517
• Published
• 37
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Paper
• 2403.11481
• Published
• 13
VideoMamba: State Space Model for Efficient Video Understanding
Paper
• 2403.06977
• Published
• 29
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper
• 2403.01422
• Published
• 30
Video as the New Language for Real-World Decision Making
Paper
• 2402.17139
• Published
• 22
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper
• 2402.13217
• Published
• 38
Memory Consolidation Enables Long-Context Video Understanding
Paper
• 2402.05861
• Published
• 9
InternVideo2: Scaling Video Foundation Models for Multimodal Video
Understanding
Paper
• 2403.15377
• Published
• 29
VidLA: Video-Language Alignment at Scale
Paper
• 2403.14870
• Published
• 15
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
Understanding
Paper
• 2404.05726
• Published
• 23
Koala: Key frame-conditioned long video-LLM
Paper
• 2404.04346
• Published
• 7
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
Interleaved Visual-Textual Tokens
Paper
• 2404.03413
• Published
• 27
Pegasus-v1 Technical Report
Paper
• 2404.14687
• Published
• 33
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
Dense Captioning
Paper
• 2404.16994
• Published
• 37
ShareGPT4Video: Improving Video Understanding and Generation with Better
Captions
Paper
• 2406.04325
• Published
• 74
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
Paper
• 2410.21264
• Published
• 9