This collection contains datasets and models related to "BLEUBERI: BLEU is a surprisingly effective reward for instruction following".
Yapei Chang PRO
yapeichang
AI & ML interests
NLP
Recent Activity
published a model 2 days ago
yapeichang/grpo_olmo3_pretrain_sft_ckpt_80pct published a model 2 days ago
yapeichang/sft_olmo3_pretrain_ckpt_80pct published a model 2 days ago
yapeichang/grpo_olmo3_pretrain_sft_ckpt_10pct