lmms-lab
/

LLaVA-OneVision-1.5-8B-Instruct

Image-Text-to-Text

feature-extraction

Model card Files Files and versions

Metrics Training metrics Community

xiangan commited on Oct 21

Commit

75a1758

·

verified ·

1 Parent(s): 59bcd19

Update README.md

Files changed (1) hide show

README.md +2 -6

README.md CHANGED Viewed

@@ -17,9 +17,6 @@ pipeline_tag: image-text-to-text
 <p>
-  <a href="https://huggingface.co/spaces/lmms-lab/LLaVA-OneVision-1.5">
-    <img alt="Project Page" src="https://img.shields.io/badge/Project%20Page-1f6feb?style=for-the-badge&logo=huggingface&logoColor=white">
-  </a>
   <a href="https://huggingface.co/papers/2509.23661">
     <img alt="Paper" src="https://img.shields.io/badge/Paper-b31b1b?style=for-the-badge&logo=arXiv&logoColor=white">
   </a>
@@ -34,9 +31,8 @@ pipeline_tag: image-text-to-text
 ## Introduction
-Built to democratize multimodal training, LLaVA-OneVision-1.5 is a fully open-source family of vision-language models trained on native-resolution images to achieve state-of-the-art performance at significantly lower cost. The project releases high-quality pretraining and SFT data, a complete and efficient training framework with recipes and configs, and comprehensive logs for transparent, reproducible research.
-**LLaVA-OneVision-1.5** introduces a novel family of **fully open-source** Large Multimodal Models (LMMs) that achieves **state-of-the-art performance**  with substantially **lower cost** through training on **native resolution** images.
 #### **Superior Performance**
   - The model leads on multiple multimodal benchmarks and generally surpasses Qwen2.5-VL.
   - Training on native-resolution images significantly improves its visual understanding.

 <p>
   <a href="https://huggingface.co/papers/2509.23661">
     <img alt="Paper" src="https://img.shields.io/badge/Paper-b31b1b?style=for-the-badge&logo=arXiv&logoColor=white">
   </a>
 ## Introduction
+Copilot said: LLaVA-OneVision-1.5 is a fully open-source family of
+LLaVA-OneVision-1.5 is a fully open-source family of large multimodal models (LMMs) built to democratize multimodal training. Trained on native‑resolution images, it delivers state‑of‑the‑art performance at substantially lower cost. The project also releases high‑quality pretraining and SFT data, a complete and efficient training framework with recipes and configs, and comprehensive logs to support transparent, reproducible research.
 #### **Superior Performance**
   - The model leads on multiple multimodal benchmarks and generally surpasses Qwen2.5-VL.
   - Training on native-resolution images significantly improves its visual understanding.