Image-Text-to-Text
Transformers
TensorBoard
Safetensors
feature-extraction
conversational
custom_code
xiangan commited on
Commit
ec07908
·
verified ·
1 Parent(s): 731bf0b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -25
README.md CHANGED
@@ -11,40 +11,57 @@ library_name: transformers
11
  ---
12
  # LLaVA-OneVision-1.5: Fully Open-Source State-of-the-Art VLM Model
13
 
14
- # ✨ Key Features
 
15
 
16
- **LLaVA-OneVision-1.5** introduces a novel family of **fully open-source** Large Multimodal Models (LMMs) that achieves **state-of-the-art performance** with substantially **lower cost** through training on **native resolution** images.
 
 
 
17
 
18
- 1. **Superior Performance**
19
- A family of fully open-source large multimodal models demonstrating **superior performance** across multiple multimodal benchmarks, **outperforming Qwen2.5-VL** in most evaluation tasks.
 
 
20
 
21
- 2. **High-Quality Data at Scale**
22
- Meticulously curated **mid-training and SFT data** with rigorous filtering and quality control.
23
- - Concept-balanced, highly diverse, high-quality caption data
24
- - Comprehensive instruction fine-tuning data covering a wide range of tasks
 
25
 
26
- 3. **Ultra-Efficient Training Framework**
27
- Complete end-to-end training framework designed for maximum efficiency:
28
- - **$16K total budget** for full model training
29
- - **45% HFU efficiency** on A100 GPUs ($0.6 per GPU/Hour)
30
- - Built on **MegatronLM** with support for **MoE**, **FP8**, and **long sequence parallelization**
31
- - Optimized codebase for cost-effective scaling
32
 
33
- 4. **Fully Open Framework** for community access and reproducibility:
34
- - High-quality mid-training & SFT data
35
- - Complete training framework & code
36
- - Training recipes & configurations
37
- - Base & instruct model checkpoints
38
- - ✅ Comprehensive training logs & metrics
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ## Code
41
  This model is trained using a fully open-source, end-to-end training framework, with all code available at [EvolvingLMMs-Lab/LLaVA-OneVision-1.5](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5).
42
 
43
- ## Dataset
44
- | Description | Link |
45
- |-------------|------|
46
- | Mid-training data for LLaVA-OneVision-1.5 | [🤗 Download (Uploading!)](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M) |
47
- | SFT data for LLaVA-OneVision-1.5 | [🤗 Download (Uploading!)](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Insturct-Data) |
48
 
49
  ## Evaluation Results
50
  All evaluations were conducted using [lmms_eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).
 
11
  ---
12
  # LLaVA-OneVision-1.5: Fully Open-Source State-of-the-Art VLM Model
13
 
14
+ ## Introduction
15
+ **LLaVA-OneVision1.5** introduces a novel family of **fully open-source** Large Multimodal Models (LMMs) that achieves **state-of-the-art performance** with substantially **lower cost** through training on **native resolution** images.
16
 
17
+ - **Superior Performance**
18
+ A family of fully open-source large multimodal models demonstrating
19
+ - Superior performance across multiple multimodal benchmarks
20
+ - outperforming **Qwen2.5-VL** in most evaluation tasks.
21
 
22
+ - **High-Quality Data at Scale**
23
+ Meticulously curated **pre-training and SFT data** with rigorous filtering and quality control, achieving **superior data efficiency** with only **64B tokens**.
24
+ - Concept-balanced, highly diverse, high-quality caption data
25
+ - Comprehensive instruction fine-tuning data covering a wide range of tasks
26
 
27
+ - **Ultra-Efficient Training Framework** Complete end-to-end training framework designed for maximum efficiency:
28
+ - $16000 total budget for full model training on A100 GPUs ($0.6 per GPU/Hour)
29
+ - 45% HFU efficiency in 8k context length
30
+ - Built on **MegatronLM** with support for **MoE**, **FP8**, and **long sequence parallelization**
31
+ - Optimized codebase for cost-effective scaling
32
 
 
 
 
 
 
 
33
 
34
+ - **Fully Open Framework** for community access and reproducibility:
35
+ - High-quality pre-training & SFT data
36
+ - Complete training framework & code
37
+ - Training recipes & configurations
38
+ - Comprehensive training logs & metrics
39
+
40
+
41
+ ## Models
42
+
43
+ | Model | HF Link | Training Log |
44
+ |--------------------------|--------------------------------------------------------------------------------------------------------|-------------|
45
+ | LLaVA-OV-1.5-4B-Instruct | [🤗 HF / 4B-Instruct](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-Instruct) | Uploading… |
46
+ | LLaVA-OV-1.5-8B-Instruct | [🤗 HF / 8B-Instruct](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-8B-Instruct) | [📈 Tensorboard](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-8B-Instruct/tensorboard) |
47
+
48
+ ## Datasets
49
+
50
+ ![Dataset Visualization](asset/dataset.jpg)
51
+ <p align="left">
52
+ <strong>(a)</strong> The vocabulary coverage proportion in the LLaVA-OneVision-1.5 Mid-Training dataset before and after concept balancing.
53
+ <strong>(b)</strong> Distribution of data sources within the LLaVA-OneVision-1.5 Mid-Training dataset.
54
+ <strong>(c)</strong> Distribution of data sources within the LLaVA-OneVision-1.5 Insturct dataset.
55
+ </p>
56
+
57
+ | Description | Link | Status |
58
+ |--------------------|--------------------------------------------------------------------------------------------------------|-------------|
59
+ | OV-1.5-Mid-Training-85M | [🤗HF/85M](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M) | Uploading… |
60
+ | OV-1.5-Instruct | [🤗HF/Inst](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Insturct-Data) | Uploading… |
61
 
62
  ## Code
63
  This model is trained using a fully open-source, end-to-end training framework, with all code available at [EvolvingLMMs-Lab/LLaVA-OneVision-1.5](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5).
64
 
 
 
 
 
 
65
 
66
  ## Evaluation Results
67
  All evaluations were conducted using [lmms_eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).