-
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection
Paper • 2312.06295 • Published -
A Quantitative Evaluation of Dense 3D Reconstruction of Sinus Anatomy from Monocular Endoscopic Video
Paper • 2310.14364 • Published • 1 -
SegVol: Universal and Interactive Volumetric Medical Image Segmentation
Paper • 2311.13385 • Published -
CADS: A Comprehensive Anatomical Dataset and Segmentation for Whole-Body Anatomy in Computed Tomography
Paper • 2507.22953 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2412.07769
-
MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation
Paper • 2412.07147 • Published • 5 -
Grounding Descriptions in Images informs Zero-Shot Visual Recognition
Paper • 2412.04429 • Published -
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
Paper • 2412.05939 • Published • 16 -
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions
Paper • 2412.08737 • Published • 54
-
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Paper • 2407.07053 • Published • 47 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 35 -
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Paper • 2407.11691 • Published • 15 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 62
-
A Survey of Medical Vision-and-Language Applications and Their Techniques
Paper • 2411.12195 • Published -
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
Paper • 2411.14522 • Published • 39 -
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities
Paper • 2412.07769 • Published • 30
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection
Paper • 2312.06295 • Published -
A Quantitative Evaluation of Dense 3D Reconstruction of Sinus Anatomy from Monocular Endoscopic Video
Paper • 2310.14364 • Published • 1 -
SegVol: Universal and Interactive Volumetric Medical Image Segmentation
Paper • 2311.13385 • Published -
CADS: A Comprehensive Anatomical Dataset and Segmentation for Whole-Body Anatomy in Computed Tomography
Paper • 2507.22953 • Published
-
A Survey of Medical Vision-and-Language Applications and Their Techniques
Paper • 2411.12195 • Published -
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
Paper • 2411.14522 • Published • 39 -
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities
Paper • 2412.07769 • Published • 30
-
MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation
Paper • 2412.07147 • Published • 5 -
Grounding Descriptions in Images informs Zero-Shot Visual Recognition
Paper • 2412.04429 • Published -
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
Paper • 2412.05939 • Published • 16 -
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions
Paper • 2412.08737 • Published • 54
-
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Paper • 2407.07053 • Published • 47 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 35 -
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Paper • 2407.11691 • Published • 15 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 62
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43