In a Training Loop 🔄

51 71 28

Solomatin Roman

Samoed

AI & ML interests

None yet

Recent Activity

updated a dataset 1 day ago

mteb/HebrewSentimentAnalysisV4

published a dataset 1 day ago

mteb/HebrewSentimentAnalysisV4

updated a dataset 2 days ago

mteb/biblenlp-corpus-mmteb

View all activity

Organizations

upvoted an article 6 days ago

Article

Building and evaluating Multimodal Rerankers

8 days ago

•

upvoted 2 articles about 1 month ago

Article

ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases

Nov 5

•

Article

Improving Parquet Dedupe on Hugging Face Hub

Oct 5, 2024

•

upvoted 3 articles about 2 months ago

Article

LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR

Oct 23

•

Article

Sentence Transformers is joining Hugging Face!

Oct 22

•

Article

Introducing MTEB v2: Evaluation of embedding and retrieval systems for more than just text

Oct 20

•

upvoted 2 papers about 2 months ago

Scaling Language-Centric Omnimodal Representation Learning

Paper • 2510.11693 • Published Oct 13 • 100

HUME: Measuring the Human-Model Performance Gap in Text Embedding Task

Paper • 2510.10062 • Published Oct 11 • 8

upvoted an article 2 months ago

Article

Vocabulary is the most important element of Sparse Retrieval

Oct 4

•

upvoted a paper 2 months ago

ModernVBERT: Towards Smaller Visual Document Retrievers

Paper • 2510.01149 • Published Oct 1 • 30

upvoted an article 2 months ago

Article

ModernVBERT: Towards Smaller Visual Document Retrievers

Oct 3

•

upvoted a collection 2 months ago

ModernVBERT

Collection

Resources for ModernVBERT • 5 items • Updated Oct 3 • 11

upvoted an article 2 months ago

Article

Introducing RTEB: A New Standard for Retrieval Evaluation

Oct 1

•

128

upvoted a changelog 2 months ago

Changelog

Repositories total file size is now displayed

Sep 18

• 172

upvoted a paper 2 months ago

AutoIntent: AutoML for Text Classification

Paper • 2509.21138 • Published Sep 25 • 35

upvoted an article 3 months ago

Article

mmBERT: ModernBERT goes Multilingual

Sep 9

•

129

upvoted a collection 3 months ago

mmBERT: a modern multilingual encoder

Collection

mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance • 16 items • Updated Sep 9 • 49

upvoted 2 papers 3 months ago

Dynaword: From One-shot to Continuously Developed Datasets

Paper • 2508.02271 • Published Aug 4 • 14

On the Theoretical Limitations of Embedding-Based Retrieval

Paper • 2508.21038 • Published Aug 28 • 20

upvoted an article 5 months ago

Article

Should We Still Pretrain Encoders with Masked Language Modeling?

Jul 2

•

Solomatin Roman

AI & ML interests

Recent Activity

Organizations

Samoed's activity

Building and evaluating Multimodal Rerankers

ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases

Improving Parquet Dedupe on Hugging Face Hub

LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR

Sentence Transformers is joining Hugging Face!

Introducing MTEB v2: Evaluation of embedding and retrieval systems for more than just text

Vocabulary is the most important element of Sparse Retrieval

ModernVBERT: Towards Smaller Visual Document Retrievers

Introducing RTEB: A New Standard for Retrieval Evaluation

Repositories total file size is now displayed

mmBERT: ModernBERT goes Multilingual

Should We Still Pretrain Encoders with Masked Language Modeling?