TrialChecker-1225

TrialChecker-1225 is a binary text classifier that estimates whether a given clinical trial “space” is a reasonable consideration for a patient, given the patient’s summary.
It is fine-tuned from [answerdotai/ModernBERT-large] for sequence classification on pairs of (trial space, patient summary).

Important: This is a research prototype for model development, not a medical device and not intended for clinical decision-making.

What counts as a “trial space”?

A trial space is a concise description of the target population a trial aims to enroll, focusing on:

Age
Sex
Cancer type & histology
Burden of disease (curative vs metastatic)
Prior or excluded treatments
Required / excluded biomarkers

(Boilerplate exclusion rules—e.g., heart failure, uncontrolled brain mets—are not part of the trial space itself. They can be screened separately by OncoReasoning-3B or BoilerplateChecker-0825 or other logic.)

Training summary

The classifier was trained with a script that:

Loads three sources of annotated patient–trial pairs:
- Pairs originating from space-specific eligibility checks
- “Patient→top-cohorts” checks (rounds 1–3)
- “Trial-space→top patients” checks (rounds 1–3)
Deduplicates by ['patient_summary', 'this_space']
Builds the final text input as:


text = this_space + "\nNow here is the patient summary:" + patient_summary

Uses eligibility_result as the binary label (0/1)
Model is ModernBERT-large (sequence classification, 2 labels) at max_length 4096

Key hyperparameters from training (on H100 x 8)

Base model: answerdotai/ModernBERT-large
Max length: 4096
Optimizer settings: learning_rate=2e-5, weight_decay=0.01
Batch size: per_device_train_batch_size=8
Epochs: 2
Save strategy: epoch
Tokenizer: AutoTokenizer.from_pretrained("answerdotai/ModernBERT-large")
Data collator: DataCollatorWithPadding

Intended use

Input: a string describing the trial space and a patient summary string
Output: probability that the trial is a reasonable consideration for that patient. This is intended to capture whether a trial is a reasonable clinical consideration based on the core clinical criteria that define a trial space (age, sex, cancer type, histology, biomarker requirements, prior treatment requirements, and cancer burden/risk stratification requirements). It is not intended to represent full eligibility screening as would be performed after trial consent.

Use cases:

Ranking candidate trial spaces for a patient
Early triage before detailed eligibility review (including boilerplate exclusions)

Out of scope:

Confirming formal eligibility or safety
Formal (autonomous) medical record review, diagnosis, or treatment decision-making

Inference (Transformers)

Quick start (single example)

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

device = "cuda" if torch.cuda.is_available() else "cpu"
MODEL_REPO = "ksg-dfci/TrialChecker-1225" 

tok = AutoTokenizer.from_pretrained(MODEL_REPO)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_REPO).to(device)
model.eval()

this_space = (
 "Age allowed: Any. "
 "Sex allowed: Male or female. "
 "Cancer type allowed: non-small cell lung cancer. "
 "Histology allowed: adenocarcinoma. "
 "Cancer burden allowed: metastatic disease. "
 "Prior treatment required: prior platinum-based chemo-immunotherapy allowed. "
 "Biomarkers required: ALK fusion."
)

patient_summary = (
 "Age: 65"
 "Sex: Male"
 "Cancer type: Non-small cell lung cancer"
 "Histology: Adenocarcinoma"
 "Cancer burden: Metastatic"
 "Biomarkers: ALK fusion detected by NGS"
 "Treatment history: Alectinib since 2023"
)

text = this_space + "\nNow here is the patient summary:" + patient_summary

# Raw Transformers model
enc = tok(text, return_tensors="pt", truncation=True, max_length=4096).to(device)
with torch.no_grad():
 logits = model(**enc).logits
probs = logits.softmax(-1).squeeze(0)

# Label mapping was set in training: {0: "NEGATIVE", 1: "POSITIVE"}
p_positive = float(probs[1])
print(f"Reasonable consideration probability: {p_positive:.3f}")

# Or pipeline API to get similar outputs
from trasnformers import pipeline
pipe = pipeline('text-classification', 'ksg-dfci/TrialChecker-1225')
pipe([text])

Batched scoring

from typing import List
import torch

def score_pairs(spaces: List[str], summaries: List[str], tokenizer, model, max_length=4096, batch_size=8):
    assert len(spaces) == len(summaries)
    device = next(model.parameters()).device
    scores = []

    for i in range(0, len(spaces), batch_size):
        batch_spaces = spaces[i:i+batch_size]
        batch_summaries = summaries[i:i+batch_size]
        texts = [s + "\nNow here is the patient summary:" + p for s, p in zip(batch_spaces, batch_summaries)]
        enc = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=max_length).to(device)
        with torch.no_grad():
            logits = model(**enc).logits
        probs = logits.softmax(-1)[:, 1]  # POSITIVE
        scores.extend(probs.detach().cpu().tolist())
    return scores

# Example
spaces = [this_space] * 3
summaries = [patient_summary, "Different summary 1...", "Different summary 2..."]
scores = score_pairs(spaces, summaries, tok, model)
print(scores)

Thresholding & calibration

Default decision: 0.5 on the POSITIVE probability.
For better calibration/operating points, tune the threshold on a validation set (e.g., maximize F1, optimize Youden’s J, or set to a desired precision).

How to prepare inputs

Trial space: as per example above, a compact “target population” disease context description, including age, sex, cancer type, histology, burden of disease (ie palliative/metastatic versus early stage/curative setting), prior/forbidden treatments, and required/excluded biomarkers. Patient summary: as per example above, a concise longitudinal summary of age, sex, cancer type, histology, current burden of disease and/or risk stratification, biomarkers, and treatment history.

You can generate these inputs with your upstream LLM pipeline (e.g., gpt-oss-120b or our OncoReasoning-3B-1225 model for summarization and space extraction), but the classifier accepts any plain strings in the format shown above.

Reproducibility (high-level)

Below is the minimal structure used by the training script to build the dataset before tokenization:

# 1) Load and merge three labeled sources
#    - space_specific_eligibility_checks.parquet
#    - top_ten_cohorts_checked_round{1,2,3}.csv
#    - top_twenty_patients_checked_round{1,2,3}.csv

# 2) Deduplicate by ['patient_summary','this_space'] and keep:
#    - split, patient_summary, this_space, eligibility_result

# 3) Compose input text and label:
text  = this_space + "\nNow here is the patient summary:" + patient_summary
label = int(eligibility_result)  # 0 or 1

# 4) Tokenize with ModernBERT tokenizer (max_length=4096, truncation=True)
# 5) Train AutoModelForSequenceClassification, which then produces probabilities for the "POSITIVE" class (trial is a reasonable consideration) and for the "NEGATIVE" class (trial is not a reasonable consideration)

To reproduce exactly, consult and run the original training scripts at https://github.com/kenlkehl/matchminer-ai-training.

Limitations & ethical considerations

Outputs reflect training data and may contain biases or errors.
The model estimates reasonableness for consideration, not formal eligibility screening.
Not validated for safety-critical use; do not use for diagnosis or treatment decisions.

Citation

If you use this model or parts of the pipeline, please cite this model card and arxiv preprint (https://arxiv.org/abs/2412.17228) or corresponding journal publication (pending).

Downloads last month: 92

Safetensors

Model size

0.4B params

Tensor type

F32

ksg-dfci
/

TrialChecker-1225