Xingrui Wang's picture

1 6

Xingrui Wang PRO

RyanWW

·

https://xingruiwang.github.io/

AI & ML interests

Computer Vision, 3D vision, Multimodal Learning

Recent Activity

authored a paper 13 days ago

Captain Safari: A World Engine

authored a paper 19 days ago

Perceptual Taxonomy: Evaluating and Guiding Hierarchical Scene Reasoning in Vision-Language Models

upvoted a paper 28 days ago

PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

View all activity

Organizations

authored a paper 13 days ago

Captain Safari: A World Engine

Paper • 2511.22815 • Published 18 days ago • 9

authored a paper 19 days ago

Perceptual Taxonomy: Evaluating and Guiding Hierarchical Scene Reasoning in Vision-Language Models

Paper • 2511.19526 • Published 22 days ago • 1

authored 7 papers about 2 months ago

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

Paper • 2406.00622 • Published Jun 2, 2024

3D-Aware Visual Question Answering about Parts, Poses and Occlusions

Paper • 2310.17914 • Published Oct 27, 2023

Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning

Paper • 2212.00259 • Published Dec 1, 2022

PulseCheck457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models

Paper • 2502.08636 • Published Feb 12

SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning

Paper • 2504.20024 • Published Apr 28

XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

Paper • 2510.15148 • Published Oct 16 • 2

KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation

Paper • 2504.09656 • Published Apr 13