BioMamba
Collection
Mamba2 continued-pretrained on PubMed/MEDLINE+Wiki+C4. arXiv:2408.02600 • 5 items • Updated
Mamba2-1.3b continued-pretrained on PubMed/MEDLINE abstracts, mixed with Wikipedia and C4 to prevent general-domain forgetting.
This is the pretraining-only checkpoint. Downstream SFT variants are released separately.
The checkpoint is saved in mamba-ssm native format (not the transformers
Mamba2Config format), so load it with mamba-ssm directly:
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
from transformers import AutoTokenizer
model = MambaLMHeadModel.from_pretrained("zmzfpc/biomamba-1.3b", device="cuda", dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("zmzfpc/biomamba-1.3b")
AutoModelForCausalLM.from_pretrained will not work on this config.
cyrilzakka/pubmed-medline (revision 432681e19469e93e6c42878d5f41fec400974fb8)wikimedia/wikipedia, config 20231101.en (revision b04c8d1ceb2f5cd4588862100d08de323dccfbaa)allenai/c4, config en (revision 1588ec454efa1a09f29cd18ddd04fe05fc8653a2)Paper: BioMamba: Domain-Adaptive Biomedical Language Models (arXiv:2408.02600). Code: https://github.com/LeoYML/BioMamba
@article{yue2024biomamba,
title = {{BioMamba}: Domain-Adaptive Biomedical Language Models},
author = {Yue, Ling and Zhu, Mingzhi and Xing, Sixue and Pan, Shaowu and
Chenthamarakshan, Vijil and Wang, Yanbo and Cao, Yunning and
Das, Payel and Fu, Tianfan},
journal = {arXiv preprint arXiv:2408.02600},
year = {2024}
}