TrialChecker-1225
TrialChecker-1225 is a binary text classifier that estimates whether a given clinical trial “space” is a reasonable consideration for a patient, given the patient’s summary.
It is fine-tuned from [answerdotai/ModernBERT-large] for sequence classification on pairs of (trial space, patient summary).
Important: This is a research prototype for model development, not a medical device and not intended for clinical decision-making.
What counts as a “trial space”?
A trial space is a concise description of the target population a trial aims to enroll, focusing on:
- Age
- Sex
- Cancer type & histology
- Burden of disease (curative vs metastatic)
- Prior or excluded treatments
- Required / excluded biomarkers
(Boilerplate exclusion rules—e.g., heart failure, uncontrolled brain mets—are not part of the trial space itself. They can be screened separately by OncoReasoning-3B or BoilerplateChecker-0825 or other logic.)
Training summary
The classifier was trained with a script that:
- Loads three sources of annotated patient–trial pairs:
- Pairs originating from space-specific eligibility checks
- “Patient→top-cohorts” checks (rounds 1–3)
- “Trial-space→top patients” checks (rounds 1–3)
- Deduplicates by
['patient_summary', 'this_space'] - Builds the final text input as:
text = this_space + "\nNow here is the patient summary:" + patient_summary
- Uses
eligibility_resultas the binary label (0/1) - Model is ModernBERT-large (sequence classification, 2 labels) at max_length 4096
Key hyperparameters from training (on H100 x 8)
- Base model:
answerdotai/ModernBERT-large - Max length: 4096
- Optimizer settings:
learning_rate=2e-5,weight_decay=0.01 - Batch size:
per_device_train_batch_size=8 - Epochs:
2 - Save strategy:
epoch - Tokenizer:
AutoTokenizer.from_pretrained("answerdotai/ModernBERT-large") - Data collator:
DataCollatorWithPadding
Intended use
- Input: a string describing the trial space and a patient summary string
- Output: probability that the trial is a reasonable consideration for that patient. This is intended to capture whether a trial is a reasonable clinical consideration based on the core clinical criteria that define a trial space (age, sex, cancer type, histology, biomarker requirements, prior treatment requirements, and cancer burden/risk stratification requirements). It is not intended to represent full eligibility screening as would be performed after trial consent.
Use cases:
- Ranking candidate trial spaces for a patient
- Early triage before detailed eligibility review (including boilerplate exclusions)
Out of scope:
- Confirming formal eligibility or safety
- Formal (autonomous) medical record review, diagnosis, or treatment decision-making
Inference (Transformers)
Quick start (single example)
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
device = "cuda" if torch.cuda.is_available() else "cpu"
MODEL_REPO = "ksg-dfci/TrialChecker-1225"
tok = AutoTokenizer.from_pretrained(MODEL_REPO)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_REPO).to(device)
model.eval()
this_space = (
"Age allowed: Any. "
"Sex allowed: Male or female. "
"Cancer type allowed: non-small cell lung cancer. "
"Histology allowed: adenocarcinoma. "
"Cancer burden allowed: metastatic disease. "
"Prior treatment required: prior platinum-based chemo-immunotherapy allowed. "
"Biomarkers required: ALK fusion."
)
patient_summary = (
"Age: 65"
"Sex: Male"
"Cancer type: Non-small cell lung cancer"
"Histology: Adenocarcinoma"
"Cancer burden: Metastatic"
"Biomarkers: ALK fusion detected by NGS"
"Treatment history: Alectinib since 2023"
)
text = this_space + "\nNow here is the patient summary:" + patient_summary
# Raw Transformers model
enc = tok(text, return_tensors="pt", truncation=True, max_length=4096).to(device)
with torch.no_grad():
logits = model(**enc).logits
probs = logits.softmax(-1).squeeze(0)
# Label mapping was set in training: {0: "NEGATIVE", 1: "POSITIVE"}
p_positive = float(probs[1])
print(f"Reasonable consideration probability: {p_positive:.3f}")
# Or pipeline API to get similar outputs
from trasnformers import pipeline
pipe = pipeline('text-classification', 'ksg-dfci/TrialChecker-1225')
pipe([text])
Batched scoring
from typing import List
import torch
def score_pairs(spaces: List[str], summaries: List[str], tokenizer, model, max_length=4096, batch_size=8):
assert len(spaces) == len(summaries)
device = next(model.parameters()).device
scores = []
for i in range(0, len(spaces), batch_size):
batch_spaces = spaces[i:i+batch_size]
batch_summaries = summaries[i:i+batch_size]
texts = [s + "\nNow here is the patient summary:" + p for s, p in zip(batch_spaces, batch_summaries)]
enc = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=max_length).to(device)
with torch.no_grad():
logits = model(**enc).logits
probs = logits.softmax(-1)[:, 1] # POSITIVE
scores.extend(probs.detach().cpu().tolist())
return scores
# Example
spaces = [this_space] * 3
summaries = [patient_summary, "Different summary 1...", "Different summary 2..."]
scores = score_pairs(spaces, summaries, tok, model)
print(scores)
Thresholding & calibration
- Default decision: 0.5 on the POSITIVE probability.
- For better calibration/operating points, tune the threshold on a validation set (e.g., maximize F1, optimize Youden’s J, or set to a desired precision).
How to prepare inputs
Trial space: as per example above, a compact “target population” disease context description, including age, sex, cancer type, histology, burden of disease (ie palliative/metastatic versus early stage/curative setting), prior/forbidden treatments, and required/excluded biomarkers. Patient summary: as per example above, a concise longitudinal summary of age, sex, cancer type, histology, current burden of disease and/or risk stratification, biomarkers, and treatment history.
You can generate these inputs with your upstream LLM pipeline (e.g., gpt-oss-120b or our OncoReasoning-3B-1225 model for summarization and space extraction), but the classifier accepts any plain strings in the format shown above.
Reproducibility (high-level)
Below is the minimal structure used by the training script to build the dataset before tokenization:
# 1) Load and merge three labeled sources
# - space_specific_eligibility_checks.parquet
# - top_ten_cohorts_checked_round{1,2,3}.csv
# - top_twenty_patients_checked_round{1,2,3}.csv
# 2) Deduplicate by ['patient_summary','this_space'] and keep:
# - split, patient_summary, this_space, eligibility_result
# 3) Compose input text and label:
text = this_space + "\nNow here is the patient summary:" + patient_summary
label = int(eligibility_result) # 0 or 1
# 4) Tokenize with ModernBERT tokenizer (max_length=4096, truncation=True)
# 5) Train AutoModelForSequenceClassification, which then produces probabilities for the "POSITIVE" class (trial is a reasonable consideration) and for the "NEGATIVE" class (trial is not a reasonable consideration)
To reproduce exactly, consult and run the original training scripts at https://github.com/kenlkehl/matchminer-ai-training.
Limitations & ethical considerations
- Outputs reflect training data and may contain biases or errors.
- The model estimates reasonableness for consideration, not formal eligibility screening.
- Not validated for safety-critical use; do not use for diagnosis or treatment decisions.
Citation
If you use this model or parts of the pipeline, please cite this model card and arxiv preprint (https://arxiv.org/abs/2412.17228) or corresponding journal publication (pending).
- Downloads last month
- 92