Trained on MS MARCO using the a similar strategy as https://huggingface.co/cross-encoder/ms-marco-MiniLM-L12-v2, except with the Ettin base models
Tom Aarsen
AI & ML interests
NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification
Recent Activity
liked
a model about 1 hour ago
tomaarsen/reranker-Qwen3.5-0.8B-doodles-image-text-to-text updated
a model about 7 hours ago
tomaarsen/reranker-Qwen3.5-0.8B-doodles-image-text-to-text published
a model about 7 hours ago
tomaarsen/reranker-Qwen3.5-0.8B-doodles-image-text-to-text Organizations
Reranker Models for GooAQ
https://huggingface.co/blog/train-reranker
-
tomaarsen/reranker-ModernBERT-large-gooaq-bce
Text Ranking • 0.4B • Updated • 478 • 9 -
tomaarsen/reranker-NeoBERT-gooaq-bce
Text Ranking • 0.2B • Updated • 3 • 2 -
tomaarsen/reranker-ModernBERT-base-gooaq-bce
Text Ranking • 0.1B • Updated • 1.09k • 3 -
tomaarsen/reranker-MiniLM-L12-gooaq-bce
Text Ranking • 33.4M • Updated • 1
Matryoshka Embedding Models
https://huggingface.co/blog/matryoshka
-
BEE-spoke-data/bert-plus-L8-v1.0-syntheticSTS-4k
Sentence Similarity • 88.1M • Updated • 5 -
aspire/acge_text_embedding
Sentence Similarity • Updated • 1.31k • • 149 -
dunzhang/stella-mrl-large-zh-v3.5-1792d
Sentence Similarity • Updated • 117k • • 50 -
NeuML/pubmedbert-base-embeddings-matryoshka
Sentence Similarity • 0.1B • Updated • 4.92k • • 23
State-of-the-Art NER models - General purpose
-
tomaarsen/span-marker-bert-base-fewnerd-fine-super
Token Classification • 0.1B • Updated • 1.81k • 15 -
tomaarsen/span-marker-roberta-large-fewnerd-fine-super
Token Classification • 0.4B • Updated • 45 • 14 -
tomaarsen/span-marker-mbert-base-multinerd
Token Classification • 0.2B • Updated • 35.2k • 66 -
tomaarsen/span-marker-roberta-large-ontonotes5
Token Classification • 0.4B • Updated • 500 • 13
State-of-the-Art NER models - Acronyms
State-of-the-Art NER models - Tagalog
SpanMarker NER Models
SpanMarker NER models for various domains
SetFitABSA models
-
tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-aspect
Text Classification • Updated • 127 • 4 -
tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-polarity
Text Classification • Updated • 132 -
tomaarsen/setfit-absa-paraphrase-mpnet-base-v2-restaurants-aspect
Text Classification • 0.1B • Updated • 5 • 1 -
tomaarsen/setfit-absa-paraphrase-mpnet-base-v2-restaurants-polarity
Text Classification • 0.1B • Updated • 2
Qwen3 Rerankers converted to Sequence Classification
Training with Prompts
See the Training with Prompts documentation for more details: https://sbert.net/examples/training/prompts/README.html
Reranker Models for MS MARCO
State-of-the-Art NER models - Biomedical domain
State-of-the-Art NER models - Keyphrases
State-of-the-Art NER models - Organizations
-
nbroad/span-marker-roberta-large-orgs-v1
Token Classification • 0.4B • Updated • 61 • 2 -
tomaarsen/span-marker-bert-base-orgs
Token Classification • Updated • 41 • 1 -
nbroad/span-marker-xdistil-l12-h384-orgs-v3
Token Classification • 33.4M • Updated • 12 -
tomaarsen/span-marker-bert-small-orgs
Token Classification • Updated • 5
SetFit models
Strong & Small Rerankers
Trained on MS MARCO using the a similar strategy as https://huggingface.co/cross-encoder/ms-marco-MiniLM-L12-v2, except with the Ettin base models
Qwen3 Rerankers converted to Sequence Classification
Reranker Models for GooAQ
https://huggingface.co/blog/train-reranker
-
tomaarsen/reranker-ModernBERT-large-gooaq-bce
Text Ranking • 0.4B • Updated • 478 • 9 -
tomaarsen/reranker-NeoBERT-gooaq-bce
Text Ranking • 0.2B • Updated • 3 • 2 -
tomaarsen/reranker-ModernBERT-base-gooaq-bce
Text Ranking • 0.1B • Updated • 1.09k • 3 -
tomaarsen/reranker-MiniLM-L12-gooaq-bce
Text Ranking • 33.4M • Updated • 1
Training with Prompts
See the Training with Prompts documentation for more details: https://sbert.net/examples/training/prompts/README.html
Matryoshka Embedding Models
https://huggingface.co/blog/matryoshka
-
BEE-spoke-data/bert-plus-L8-v1.0-syntheticSTS-4k
Sentence Similarity • 88.1M • Updated • 5 -
aspire/acge_text_embedding
Sentence Similarity • Updated • 1.31k • • 149 -
dunzhang/stella-mrl-large-zh-v3.5-1792d
Sentence Similarity • Updated • 117k • • 50 -
NeuML/pubmedbert-base-embeddings-matryoshka
Sentence Similarity • 0.1B • Updated • 4.92k • • 23
Reranker Models for MS MARCO
State-of-the-Art NER models - General purpose
-
tomaarsen/span-marker-bert-base-fewnerd-fine-super
Token Classification • 0.1B • Updated • 1.81k • 15 -
tomaarsen/span-marker-roberta-large-fewnerd-fine-super
Token Classification • 0.4B • Updated • 45 • 14 -
tomaarsen/span-marker-mbert-base-multinerd
Token Classification • 0.2B • Updated • 35.2k • 66 -
tomaarsen/span-marker-roberta-large-ontonotes5
Token Classification • 0.4B • Updated • 500 • 13
State-of-the-Art NER models - Biomedical domain
State-of-the-Art NER models - Acronyms
State-of-the-Art NER models - Keyphrases
State-of-the-Art NER models - Tagalog
State-of-the-Art NER models - Organizations
-
nbroad/span-marker-roberta-large-orgs-v1
Token Classification • 0.4B • Updated • 61 • 2 -
tomaarsen/span-marker-bert-base-orgs
Token Classification • Updated • 41 • 1 -
nbroad/span-marker-xdistil-l12-h384-orgs-v3
Token Classification • 33.4M • Updated • 12 -
tomaarsen/span-marker-bert-small-orgs
Token Classification • Updated • 5
SpanMarker NER Models
SpanMarker NER models for various domains
SetFit models
SetFitABSA models
-
tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-aspect
Text Classification • Updated • 127 • 4 -
tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-polarity
Text Classification • Updated • 132 -
tomaarsen/setfit-absa-paraphrase-mpnet-base-v2-restaurants-aspect
Text Classification • 0.1B • Updated • 5 • 1 -
tomaarsen/setfit-absa-paraphrase-mpnet-base-v2-restaurants-polarity
Text Classification • 0.1B • Updated • 2