Qwen2.5-1.5B-R1-SLERP

A SLERP merge (t=0.5) of:

Qwen/Qwen2.5-1.5B-Instruct — strong general instruction following
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B — RL-distilled chain-of-thought reasoning

Part of a systematic merge study on the Qwen2.5-1.5B family. See also:

Mohaaxa/Qwen2.5-1.5B-R1-SLERP-AWQ — AWQ 4-bit quantized version

Benchmarks

Evaluated against both parent models on PPL (Wikitext-2) and GSM8K (100 samples):

Model	PPL	GSM8K
Qwen2.5-1.5B-Instruct (parent)	16.141	38.0%
DeepSeek-R1-Distill-Qwen-1.5B (parent)	107.467	3.0%
Qwen2.5-1.5B-R1-SLERP (this model)	1205.427	2.0%

PPL delta vs Instruct parent: +1189.286 GSM8K delta vs Instruct parent: -36.0%

Merge Config

merge_method: slerp
base_model:
  model: Qwen/Qwen2.5-1.5B-Instruct
slices:
  - sources:
      - model: Qwen/Qwen2.5-1.5B-Instruct
        layer_range: [0, 28]
      - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
        layer_range: [0, 28]
    parameters:
      t: 0.5

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Mohaaxa/Qwen2.5-1.5B-R1-SLERP",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Mohaaxa/Qwen2.5-1.5B-R1-SLERP")

Notes

t=0.5 gives equal weight to both parents
SLERP preserves weight magnitude better than linear interpolation
Both parents share identical Qwen2.5 architecture (28 layers, hidden_dim=1536)
For a quantized version with ~67% VRAM reduction, use the AWQ variant

Downloads last month: 2

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mohaaxa/Qwen2.5-1.5B-R1-SLERP

Qwen/Qwen2.5-1.5B-Instruct

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Merge model

this model