Update README.md
Browse files
README.md
CHANGED
|
@@ -18,6 +18,10 @@ license: apache-2.0
|
|
| 18 |
Quantized version of [deepseek-ai/DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)
|
| 19 |
|
| 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
### Model Optimizations
|
| 22 |
These models were obtained by quantizing the weights and activations of DeepSeek models to mixed-precision data types (W4(int)A(FP)8 for MoE layers and FP8 for dense layers).
|
| 23 |
This optimization reduces the number of bits per parameter 4/8, significantly reducing GPU memory requirements.
|
|
|
|
| 18 |
Quantized version of [deepseek-ai/DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)
|
| 19 |
|
| 20 |
|
| 21 |
+
| Model| MMLU |
|
| 22 |
+
|-------|-------|
|
| 23 |
+
| novita/Deepseek-V3.1-W4AFP8 | 0.8680 |
|
| 24 |
+
|
| 25 |
### Model Optimizations
|
| 26 |
These models were obtained by quantizing the weights and activations of DeepSeek models to mixed-precision data types (W4(int)A(FP)8 for MoE layers and FP8 for dense layers).
|
| 27 |
This optimization reduces the number of bits per parameter 4/8, significantly reducing GPU memory requirements.
|