--- license: mit datasets: - lmms-lab/GQA - dmarsili/Omni3D-Bench - cambridgeltl/vsr_random - snowclipsed/TallyQA language: - en base_model: - ShilongLiu/GroundingDINO pipeline_tag: object-detection tags: - object-detection - computer-vision --- # Model Card for VALOR-GroundingDINO This is the verified-tuned GroundingDINO model from the paper: [No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers](https://glab-caltech.github.io/valor/) For further information please refer to the [project webpage](https://glab-caltech.github.io/valor/), [paper](https://arxiv.org/abs/2512.08889), and [repository](https://github.com/damianomarsili/VALOR). ## Citation If you use VALOR in your research, please consider citing our work: **BibTeX:** ``` @misc{marsili2025labelsproblemtrainingvisual, title={No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers}, author={Damiano Marsili and Georgia Gkioxari}, year={2025}, eprint={2512.08889}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2512.08889}, } ```