Qwen2.5-Coder-1.5B-LF-FIM-Heavy
Finetuned from Qwen/Qwen2.5-Coder-1.5B.
HumanEval-Infilling (multi-line)
- pass@1 = 53.23%
- pass@10 = 62.62%
- pass@20 = 64.35%
Since this evaluation script uses Qwen FIM tokens in prefix then suffix then middle order, this is PSM-style evaluation.
Benchmark
- HumanEval-Infilling (single-line)
- Tasks: 1033
- Samples/task: 20
- Metric: pass@k (functional correctness)
Results
- pass@1: finetuned=85.48%, base=64.63%, delta=20.85%, 95% CI=[18.27%, 23.42%]
- pass@10: finetuned=90.58%, base=74.48%, delta=16.11%, 95% CI=[13.59%, 18.75%]
- pass@20: finetuned=91.58%, base=75.90%, delta=15.68%, 95% CI=[12.97%, 18.30%]
Setup for singleline
- temperature=0.2, top_p=0.95, max_new_tokens=128
- batched decoding (batch_size=16)
- same evaluation harness/config for both models
Competitive multi-line performance vs larger open models.
- Downloads last month
- 137
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.