video mllm - a flow2023 Collection

flow2023 's Collections

human generation

3D

motion generation

generation-diffusion

video mllm

updated Oct 30, 2024

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15, 2024 • 37
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Paper • 2403.11481 • Published Mar 18, 2024 • 13
VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11, 2024 • 29
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 30
Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27, 2024 • 22
VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 38
Memory Consolidation Enables Long-Context Video Understanding

Paper • 2402.05861 • Published Feb 8, 2024 • 9
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Paper • 2403.15377 • Published Mar 22, 2024 • 29
VidLA: Video-Language Alignment at Scale

Paper • 2403.14870 • Published Mar 21, 2024 • 15
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Paper • 2404.05726 • Published Apr 8, 2024 • 23
Koala: Key frame-conditioned long video-LLM

Paper • 2404.04346 • Published Apr 5, 2024 • 7
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

Paper • 2404.03413 • Published Apr 4, 2024 • 27
Pegasus-v1 Technical Report

Paper • 2404.14687 • Published Apr 23, 2024 • 33
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25, 2024 • 37
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 74
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior

Paper • 2410.21264 • Published Oct 28, 2024 • 9