🏥 Medical Query Router

Few-shot classifier that routes patient queries into 3 safety tiers.

Built with SetFit — trained on just 90 hand-crafted examples (30 per class) using contrastive learning.

Classes

Tier	Label	Action	Example
🟢	`low_stakes`	Chatbot answers directly	"How much paracetamol for a headache? I'm 30 and healthy"
🟡	`high_stakes`	Doctor reviews before responding	"Can I take ibuprofen while on blood thinners?"
🔴	`urgent`	Tell patient to call 911/999 NOW	"Crushing chest pain going down my left arm"

Performance

Evaluated on 45 held-out examples (15 per class) including deliberate edge cases:

Metric	Score
Weighted F1	0.888
Accuracy	88.9%
Urgent Recall	93.3%
Urgent Precision	82.4%
Low Stakes F1	0.933
High Stakes F1	0.857

Confusion Matrix

              Predicted →  low   high  urgent
low_stakes                  14     0      1
high_stakes                  1    12      2
urgent                       0     1     14

Backbone Comparison

We trained 3 models and selected the best:

Backbone	F1	Urgent Recall	Safety Score
all-mpnet-base-v2 ★	0.888	0.933	0.859
all-MiniLM-L6-v2	0.846	0.867	0.789
MedEmbed-base-v0.1	0.801	0.867	0.748

Usage

from setfit import SetFitModel

model = SetFitModel.from_pretrained("boredpanda9/medical-query-router")

queries = [
    "What are some healthy ways to lose weight?",
    "Can I take naproxen with my blood pressure medication?",
    "I have crushing chest pain spreading to my left arm",
]

predictions = model.predict(queries)
print(predictions)
# ['low_stakes', 'high_stakes', 'urgent']

# With confidence scores
probabilities = model.predict_proba(queries)
print(probabilities)

Training Details

Method: SetFit (Sentence-Transformer fine-tuning + Logistic Regression head)
Paper: Efficient Few-Shot Learning Without Prompts
Base model: sentence-transformers/all-mpnet-base-v2 (109.5M params)
Training data: 90 hand-crafted examples (30 per class)
Contrastive pairs: 3,600 (generated via R=20 pair sampling)
Epochs: 1 (contrastive phase) + 1 (head phase)
Body learning rate: 2e-5
Head learning rate: 1e-2
Batch size: 16 (contrastive), 2 (head)
Loss: CosineSimilarityLoss
Head: Logistic Regression with balanced class weights

Class Design Rationale

🟢 Low Stakes

Queries where a chatbot can safely provide general information:

OTC medication dosing for otherwise healthy adults (paracetamol, ibuprofen, antihistamines)
General wellness (weight loss, sleep, hydration, exercise)
Mild, self-limiting symptoms with no red flags (common cold, mild fever in children who are otherwise well, minor cuts/grazes)
Lifestyle and prevention advice

🟡 High Stakes

Queries requiring clinical judgement — a doctor must review before responding:

Prescription medication dosing where errors cause harm (insulin, warfarin, metformin, chemotherapy)
Drug interactions (especially with narrow therapeutic index drugs)
Comorbidities that change management (diabetes + wound, COPD + ankle swelling)
Pregnancy/breastfeeding medication safety
Chronic disease management and flare-ups
Red flags in symptoms (unexplained weight loss, persistent cough >3 weeks, changing moles)
Children's prescription medications
Mental health (non-crisis)

🔴 Urgent

Life-threatening emergencies — patient must call 911/999/112 immediately:

Signs of heart attack (chest pain + arm/jaw, sweating, collapse)
Signs of stroke (FAST: Face drooping, Arm weakness, Speech difficulty, Time to call)
Breathing emergencies (anaphylaxis, severe asthma, choking, blue lips)
Overdose or poisoning (especially in children)
Suicidal crisis (active plan, immediate danger)
Severe bleeding or major trauma
Meningitis signs (non-blanching rash + fever + neck stiffness)
Seizures lasting >5 minutes
Unconscious/unresponsive person

Limitations

⚠️ This is a routing tool, not a diagnostic tool. It decides who should answer a query, not what the answer is.

Trained on 90 examples — may misclassify unusual or ambiguous queries
Designed for English-language queries in UK/US healthcare contexts
Should be used as a first-pass filter with human oversight, never as the sole decision-maker
The model errs toward safety (high_stakes/urgent) when uncertain — this is by design
Not validated on real clinical data — performance on actual patient messages may differ from the eval set

License

Apache 2.0

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for boredpanda9/medical-query-router

Base model

sentence-transformers/all-mpnet-base-v2

Finetuned

(380)

this model

Paper for boredpanda9/medical-query-router

Efficient Few-Shot Learning Without Prompts

Paper • 2209.11055 • Published Sep 22, 2022 • 6

Evaluation results

Weighted F1
self-reported

0.888
Accuracy
self-reported

0.889
Urgent Recall
self-reported

0.933