Perceptual Taxonomy: Evaluating and Guiding Hierarchical Scene Reasoning in Vision-Language Models Paper • 2511.19526 • Published 22 days ago • 1
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering Paper • 2406.00622 • Published Jun 2, 2024
3D-Aware Visual Question Answering about Parts, Poses and Occlusions Paper • 2310.17914 • Published Oct 27, 2023
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning Paper • 2212.00259 • Published Dec 1, 2022
PulseCheck457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models Paper • 2502.08636 • Published Feb 12
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning Paper • 2504.20024 • Published Apr 28
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models Paper • 2510.15148 • Published Oct 16 • 2
KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation Paper • 2504.09656 • Published Apr 13