An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling Paper • 2209.01540 • Published Sep 4, 2022 • 2