This is a 'tiny' masked language model fine-tuned on synthetic oncology clinical text from prajjwal1/bert-tiny as a preparatory step to training TinyBertOncoTagger.

Training data: https://huggingface.co/datasets/ksg-dfci/mmai-synthetic/blob/main/all_synthetic_notes.parquet

Training script: https://github.com/kenlkehl/matchminer-ai-training/blob/main/3b_train_tiny_oncbert.py

Training script call:

accelerate launch 3b_train_tiny_oncbert.py
--data trial_space_lineitems.csv:trial_text
trial_space_lineitems.csv:this_space
trial_space_lineitems.csv:trial_boilerplate_text
all_synthetic_notes.parquet:synthetic_note
--output_dir ./onc_bert_tiny
--per_device_train_batch_size 64

Downloads last month
25
Safetensors
Model size
4.42M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ksg-dfci/TinyOncBert-1225

Finetuned
(80)
this model