Image-Text-to-Text
Transformers
TensorBoard
Safetensors
feature-extraction
conversational
custom_code
xiangan commited on
Commit
75a1758
·
verified ·
1 Parent(s): 59bcd19

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -6
README.md CHANGED
@@ -17,9 +17,6 @@ pipeline_tag: image-text-to-text
17
 
18
 
19
  <p>
20
- <a href="https://huggingface.co/spaces/lmms-lab/LLaVA-OneVision-1.5">
21
- <img alt="Project Page" src="https://img.shields.io/badge/Project%20Page-1f6feb?style=for-the-badge&logo=huggingface&logoColor=white">
22
- </a>
23
  <a href="https://huggingface.co/papers/2509.23661">
24
  <img alt="Paper" src="https://img.shields.io/badge/Paper-b31b1b?style=for-the-badge&logo=arXiv&logoColor=white">
25
  </a>
@@ -34,9 +31,8 @@ pipeline_tag: image-text-to-text
34
 
35
  ## Introduction
36
 
37
- Built to democratize multimodal training, LLaVA-OneVision-1.5 is a fully open-source family of vision-language models trained on native-resolution images to achieve state-of-the-art performance at significantly lower cost. The project releases high-quality pretraining and SFT data, a complete and efficient training framework with recipes and configs, and comprehensive logs for transparent, reproducible research.
38
- **LLaVA-OneVision-1.5** introduces a novel family of **fully open-source** Large Multimodal Models (LMMs) that achieves **state-of-the-art performance** with substantially **lower cost** through training on **native resolution** images.
39
-
40
  #### **Superior Performance**
41
  - The model leads on multiple multimodal benchmarks and generally surpasses Qwen2.5-VL.
42
  - Training on native-resolution images significantly improves its visual understanding.
 
17
 
18
 
19
  <p>
 
 
 
20
  <a href="https://huggingface.co/papers/2509.23661">
21
  <img alt="Paper" src="https://img.shields.io/badge/Paper-b31b1b?style=for-the-badge&logo=arXiv&logoColor=white">
22
  </a>
 
31
 
32
  ## Introduction
33
 
34
+ Copilot said: LLaVA-OneVision-1.5 is a fully open-source family of
35
+ LLaVA-OneVision-1.5 is a fully open-source family of large multimodal models (LMMs) built to democratize multimodal training. Trained on native‑resolution images, it delivers stateoftheart performance at substantially lower cost. The project also releases high‑quality pretraining and SFT data, a complete and efficient training framework with recipes and configs, and comprehensive logs to support transparent, reproducible research.
 
36
  #### **Superior Performance**
37
  - The model leads on multiple multimodal benchmarks and generally surpasses Qwen2.5-VL.
38
  - Training on native-resolution images significantly improves its visual understanding.