Transformers
GGUF
English
aashish1904 commited on
Commit
63f8cab
·
verified ·
1 Parent(s): 081f5ab

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +168 -0
README.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: apache-2.0
5
+ datasets:
6
+ - open-r1/Mixture-of-Thoughts
7
+ language:
8
+ - en
9
+ base_model:
10
+ - open-r1/Qwen2.5-Math-7B-RoPE-300k
11
+ library_name: transformers
12
+
13
+ ---
14
+
15
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
16
+
17
+
18
+ # QuantFactory/OpenR1-Distill-7B-GGUF
19
+ This is quantized version of [open-r1/OpenR1-Distill-7B](https://huggingface.co/open-r1/OpenR1-Distill-7B) created using llama.cpp
20
+
21
+ # Original Model Card
22
+
23
+
24
+ <img src="open-r1-thumbnail.png" alt="Centered Image" style="display: block; margin: 0 auto;" width="300">
25
+
26
+ # Model summary
27
+
28
+ OpenR1-Distill-7B is post-trained version of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) on [Mixture-of-Thoughts](https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts): a curated dataset of 350k verified reasoning traces distilled from [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1). The dataset spans tasks in mathematics, coding, and science, and is designed to teach language models to reason step-by-step.
29
+
30
+ OpenR1-Distill-7B replicates the reasoning capabilities of [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) while remaining fully open and reproducible. It is ideal for research on inference-time compute and reinforcement learning with verifiable rewards (RLVR).
31
+
32
+ ## Model description
33
+
34
+ - **Model type:** A 7B parameter GPT-like model, post-trained on a mix of publicly available, synthetic datasets.
35
+ - **Language(s) (NLP):** Primarily English
36
+ - **License:** Apache 2.0
37
+ - **Finetuned from model:** a [variant](https://huggingface.co/open-r1/Qwen2.5-Math-7B-RoPE-300k) of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B), whose RoPE base frequency was extended to 300k to enable training on a context of 32k tokens.
38
+
39
+ ### Model Sources
40
+
41
+ <!-- Provide the basic links for the model. -->
42
+
43
+ - **Repository:** https://github.com/huggingface/open-r1
44
+ - **Training logs:** https://wandb.ai/huggingface/open-r1/runs/199cum6l
45
+ - **Evaluation logs:** https://huggingface.co/datasets/open-r1/details-open-r1_OpenR1-Distill-7B
46
+
47
+ ## Usage
48
+
49
+ To chat with the model, first install 🤗 Transformers:
50
+
51
+ ```shell
52
+ pip install transformers>0.52
53
+ ```
54
+
55
+ Then run the chat CLI as follows:
56
+
57
+ ```shell
58
+ transformers chat open-r1/OpenR1-Distill-7B \
59
+ max_new_tokens=2048 \
60
+ do_sample=True \
61
+ temperature=0.6 \
62
+ top_p=0.95
63
+ ```
64
+
65
+ Alternatively, run the model using the `pipeline()` function:
66
+
67
+ ```python
68
+ import torch
69
+ from transformers import pipeline
70
+
71
+ pipe = pipeline("text-generation", model="open-r1/OpenR1-Distill-7B", torch_dtype=torch.bfloat16, device_map="auto")
72
+
73
+ messages = [
74
+ {"role": "user", "content": "Which number is larger, 9.9 or 9.11?"},
75
+ ]
76
+ outputs = pipe(messages, max_new_tokens=2048, do_sample=True, temperature=0.6, top_p=0.95, return_full_text=False)
77
+ print(outputs[0]["generated_text"])
78
+ ```
79
+
80
+
81
+ ## Performance
82
+
83
+ We use [Lighteval](https://github.com/huggingface/lighteval) to evaluate models on the following benchmarks:
84
+
85
+ | Model | AIME 2024 | MATH-500 | GPQA Diamond | LiveCodeBench v5 |
86
+ |-----------------------------|-----------|----------|--------|---------------|
87
+ | OpenR1-Distill-7B | 52.7 | 89.0 | 52.8 | 39.4 |
88
+ | DeepSeek-R1-Distill-Qwen-7B | 51.3 | 93.5 | 52.4 | 37.4 |
89
+
90
+ All scores denote pass@1 accuracy and use sampling with `temperature=0.6` and `top_p=0.95`. The DeepSeek-R1 tech report uses sampling with 4-64 responses per query to estimate pass@1, but does not specify the specific number of responses per benchmark. In the table above, we estimate pass@1 accuracy with the following number of responses per query:
91
+
92
+ | Benchmark | Number of responses per query |
93
+ |:-------------:|:-----------------------------:|
94
+ | AIME 2024 | 64 |
95
+ | MATH-500 | 4 |
96
+ | GPQA Diamond | 8 |
97
+ | LiveCodeBench | 16 |
98
+
99
+ Note that for benchmarks like AIME 2024, it is important to sample many responses as there are only 30 problems and this introduces high variance across repeated runs. The choice of how many responses to sample per prompt likely explains the small differences between our evaluation results and those reported by DeepSeek. Check out the [`open-r1` repo](https://github.com/huggingface/open-r1?tab=readme-ov-file#evaluating-models) for instructions on how to reproduce these results.
100
+
101
+ ## Training methodology
102
+
103
+ OpenR1-Distill-7B was trained using supervised fine-tuning (SFT) on the [Mixture-of-Thoughts](https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts) dataset, which contains 350k reasoning traces distilled from [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1). To optimise the data mixture, we followed the same methodology described in the [Phi-4-reasoning tech report](https://huggingface.co/papers/2504.21318), namely that mixtures can be optimised independently per domain, and then combined into a single dataset. The figure below shows the evolution of our experiments on the math and code domains:
104
+
105
+ <img src="data_mixture.png" alt="Centered Image" style="display: block; margin: 0 auto;">
106
+
107
+ The individual experiments correspond to the following:
108
+
109
+ * **exp1 - exp3:** extending the model's base RoPE frequency from 10k to 100k, 300k, and 500k respectively. We find there is no significant difference between the scaling factors, and used 300k in all subsequent experiments.
110
+ * **exp4 - exp6:** independently scaling the learning rate on the math and code mixtures from 1e-5 to 2e-5, and 4e-5 respectively.
111
+ * **exp7 - exp8:** measuring the impact of sequence packing (exp7) versus no packing (exp8) on the math mixture.
112
+ * **exp9 - exp10:** measuring the impact of training on all three mixtures (math, code, and science) versus training on math and code only.
113
+
114
+ > [!NOTE]
115
+ > We use LiveCodeBench v4 to accelerate evaluation during our ablations as it contains around half the problems of v5, yet is still representative of the full benchmark.
116
+
117
+ ### Training hyperparameters
118
+
119
+ The following hyperparameters were used during training:
120
+
121
+ - num_epochs: 5.0
122
+ - learning_rate: 4.0e-05
123
+ - num_devices: 8
124
+ - train_batch_size: 2
125
+ - gradient_accumulation_steps: 8
126
+ - total_train_batch_size: 2 * 8 * 8 = 128
127
+ - seed: 42
128
+ - distributed_type: DeepSpeed ZeRO-3
129
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
130
+ - lr_scheduler_type: cosine_with_min_lr with min_lr_rate=0.1
131
+ - lr_scheduler_warmup_ratio: 0.03
132
+ - max_grad_norm: 0.2
133
+
134
+ ### Training results
135
+
136
+ During training, we monitor progress on AIME 2024, GPQA Diamond, and LiveCodeBench v4 every epoch. The following plot shows the training results:
137
+
138
+ <img src="train_results.png" alt="Centered Image" style="display: block; margin: 0 auto;">
139
+
140
+ ### Framework versions
141
+
142
+ - Platform: Linux-5.15.0-1049-aws-x86_64-with-glibc2.31
143
+ - Python version: 3.11.11
144
+ - TRL version: 0.18.0.dev0
145
+ - PyTorch version: 2.6.0
146
+ - Transformers version: 4.52.0.dev0
147
+ - Accelerate version: 1.4.0
148
+ - Datasets version: 3.5.1
149
+ - HF Hub version: 0.30.2
150
+ - bitsandbytes version: 0.45.5
151
+ - DeepSpeed version: 0.16.8
152
+ - Liger-Kernel version: 0.5.9
153
+ - OpenAI version: 1.76.2
154
+ - vLLM version: 0.8.4
155
+
156
+ ## Citation
157
+
158
+ If you find this model is useful in your own work, please consider citing it as follows:
159
+
160
+ ```bibtex
161
+ @misc{openr1,
162
+ title = {Open R1: A fully open reproduction of DeepSeek-R1},
163
+ url = {https://github.com/huggingface/open-r1},
164
+ author = {Hugging Face},
165
+ month = {January},
166
+ year = {2025}
167
+ }
168
+ ```