Qwen2.5-1.5B-R1-SLERP

A SLERP merge (t=0.5) of:

Part of a systematic merge study on the Qwen2.5-1.5B family. See also:

Benchmarks

Evaluated against both parent models on PPL (Wikitext-2) and GSM8K (100 samples):

Model PPL GSM8K
Qwen2.5-1.5B-Instruct (parent) 16.141 38.0%
DeepSeek-R1-Distill-Qwen-1.5B (parent) 107.467 3.0%
Qwen2.5-1.5B-R1-SLERP (this model) 1205.427 2.0%

PPL delta vs Instruct parent: +1189.286 GSM8K delta vs Instruct parent: -36.0%

Merge Config

merge_method: slerp
base_model:
  model: Qwen/Qwen2.5-1.5B-Instruct
slices:
  - sources:
      - model: Qwen/Qwen2.5-1.5B-Instruct
        layer_range: [0, 28]
      - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
        layer_range: [0, 28]
    parameters:
      t: 0.5

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Mohaaxa/Qwen2.5-1.5B-R1-SLERP",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Mohaaxa/Qwen2.5-1.5B-R1-SLERP")

Notes

  • t=0.5 gives equal weight to both parents
  • SLERP preserves weight magnitude better than linear interpolation
  • Both parents share identical Qwen2.5 architecture (28 layers, hidden_dim=1536)
  • For a quantized version with ~67% VRAM reduction, use the AWQ variant
Downloads last month
2
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Mohaaxa/Qwen2.5-1.5B-R1-SLERP