view article Article ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases Nov 5 • 53
view article Article LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR Oct 23 • 62
view article Article Introducing MTEB v2: Evaluation of embedding and retrieval systems for more than just text Oct 20 • 33
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published Oct 13 • 100
HUME: Measuring the Human-Model Performance Gap in Text Embedding Task Paper • 2510.10062 • Published Oct 11 • 8
mmBERT: a modern multilingual encoder Collection mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance • 16 items • Updated Sep 9 • 49
On the Theoretical Limitations of Embedding-Based Retrieval Paper • 2508.21038 • Published Aug 28 • 20