This is a 'tiny' masked language model fine-tuned on synthetic oncology clinical text from prajjwal1/bert-tiny as a preparatory step to training TinyBertOncoTagger.
Training data: https://huggingface.co/datasets/ksg-dfci/mmai-synthetic/blob/main/all_synthetic_notes.parquet
Training script: https://github.com/kenlkehl/matchminer-ai-training/blob/main/3b_train_tiny_oncbert.py
Training script call:
accelerate launch 3b_train_tiny_oncbert.py
--data trial_space_lineitems.csv:trial_text
trial_space_lineitems.csv:this_space
trial_space_lineitems.csv:trial_boilerplate_text
all_synthetic_notes.parquet:synthetic_note
--output_dir ./onc_bert_tiny
--per_device_train_batch_size 64
- Downloads last month
- 25
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for ksg-dfci/TinyOncBert-1225
Base model
prajjwal1/bert-tiny