Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
shail-2512 's Collections
MultiModal (Any-to-Any)
ALMs (Audio Language Models)
LLMs
TTS
Coder
Reasoning (LRMs)
Image Generation
VLMs
3D
Video Generation
Speech Recognition
Dataset to fine-tune Embeddings
Reranking Models
Embedding Models

VLMs

updated Dec 2, 2024
Upvote
-

  • HuggingFaceTB/SmolVLM-Instruct

    Image-Text-to-Text • 2B • Updated Apr 8, 2025 • 28.3k • 585

  • microsoft/OmniParser

    Image-Text-to-Text • Updated Dec 2, 2024 • 263 • 1.71k

  • vidore/colsmolvlm-v0.1

    Visual Document Retrieval • Updated Mar 14, 2025 • 12 • 55

  • meta-llama/Llama-3.2-11B-Vision-Instruct

    Image-Text-to-Text • 11B • Updated Dec 4, 2024 • 147k • 1.59k

  • Qwen/Qwen2-VL-7B-Instruct

    Image-Text-to-Text • 8B • Updated Feb 6, 2025 • 3.04M • 1.27k

  • mistral-experimental/pixtral-12b

    Image-Text-to-Text • 13B • Updated Jan 27, 2025 • 126k • 104

  • HuggingFaceM4/Idefics3-8B-Llama3

    Image-Text-to-Text • Updated Dec 2, 2024 • 335k • 304

  • allenai/Molmo-7B-O-0924

    Image-Text-to-Text • 8B • Updated Oct 9, 2025 • 776 • 163
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs