Update README.md
Browse files
README.md
CHANGED
|
@@ -15,7 +15,7 @@ tags:
|
|
| 15 |
VILA is a visual language model (VLM) pretrained with interleaved image-text data at scale, enabling multi-image VLM. VILA is deployable on the edge, including Jetson Orin and laptop by AWQ 4bit quantization through TinyChat framework. We find: (1) image-text pairs are not enough, interleaved image-text is essential; (2) unfreezing LLM during interleaved image-text pre-training enables in-context learning; (3)re-blending text-only instruction data is crucial to boost both VLM and text-only performance. VILA unveils appealing capabilities, including: multi-image reasoning, in-context learning, visual chain-of-thought, and better world knowledge.
|
| 16 |
|
| 17 |
**Model date:**
|
| 18 |
-
|
| 19 |
|
| 20 |
**Paper or resources for more information:**
|
| 21 |
https://github.com/Efficient-Large-Model/VILA
|
|
|
|
| 15 |
VILA is a visual language model (VLM) pretrained with interleaved image-text data at scale, enabling multi-image VLM. VILA is deployable on the edge, including Jetson Orin and laptop by AWQ 4bit quantization through TinyChat framework. We find: (1) image-text pairs are not enough, interleaved image-text is essential; (2) unfreezing LLM during interleaved image-text pre-training enables in-context learning; (3)re-blending text-only instruction data is crucial to boost both VLM and text-only performance. VILA unveils appealing capabilities, including: multi-image reasoning, in-context learning, visual chain-of-thought, and better world knowledge.
|
| 16 |
|
| 17 |
**Model date:**
|
| 18 |
+
VILA1.5-3b-s2 was trained in May 2024.
|
| 19 |
|
| 20 |
**Paper or resources for more information:**
|
| 21 |
https://github.com/Efficient-Large-Model/VILA
|