Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Paper
•
2406.14544
•
Published
•
35
Model details
PrismCaptioners are open-source captioners with LLaVA architecture finetuned on GPT4V-assisted dataset ALLaVA. We have released PrismCaptioner-7B and PrismCaptioner-2B.
PrismCaptioner-7B details
Paper and codebase for more information: [Paper] [Code]
Intended uses
Model usage
Clone the Prism repo and complete the preparation. You can use PrismCaptioners following usage or demo below.
# In the Prism repo folder
from decouple import supported_VLM
model = supported_VLM['prismcaptioner-7b']()
res = model.generate(['assets/case1.png', 'Given the image below, please provide a detailed description of what you see.'])