Instructions to use microsoft/MAI-DS-R1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use microsoft/MAI-DS-R1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="microsoft/MAI-DS-R1", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("microsoft/MAI-DS-R1", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("microsoft/MAI-DS-R1", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use microsoft/MAI-DS-R1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "microsoft/MAI-DS-R1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/MAI-DS-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/microsoft/MAI-DS-R1

SGLang

How to use microsoft/MAI-DS-R1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "microsoft/MAI-DS-R1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/MAI-DS-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "microsoft/MAI-DS-R1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/MAI-DS-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use microsoft/MAI-DS-R1 with Docker Model Runner:
```
docker model run hf.co/microsoft/MAI-DS-R1
```

Can anyone benchmark it against DeepSeekR10528? I didn't find any precise benchmark data.

#13

by likewendy - opened Jun 3, 2025

Discussion

likewendy

Jun 3, 2025

https://huggingface.co/deepseek-ai/DeepSeek-R1-0528

likewendy

Jun 3, 2025

Because I feel that the two models are similar in tone and style, I want to see how the benchmark compares.

I found that this model is not bad. It is a fine-tuning of R1. The answer style and reasoning are different, but it is not an ideological fine-tuning. I don’t think there is anything wrong with its answer. It didn’t even find the boundary of slandering the Chinese Communist Party.

For jailbreaking, it seems to be more difficult than R1. After jailbreaking, R1’s answer is still coherent and logical, but it is easy to reject again, and the answer effect is also relatively poor.

Therefore, the last thing it should do is to benchmark with R1-1776. A model built on ideology does not need to test the benchmark.

likewendy changed discussion status to closed Jun 3, 2025

likewendy changed discussion status to open Jun 3, 2025

Satyam-Singh

Jul 12, 2025

🧠 Evaluation on General Knowledge and Reasoning

Categories	Benchmarks	Metrics	DS-R1	R1-0528	MAI-DS-R1
General Knowledge	anli_r30	7-shot Acc	0.686	0.673	0.697
	arc_challenge	10-shot Acc	0.963	0.963	0.963
	hellaswag	5-shot Acc	0.864	0.860	0.859
	mmlu (all)	5-shot Acc	0.867	0.863	0.870
	mmlu/humanities	5-shot Acc	0.794	0.784	0.801
	mmlu/other	5-shot Acc	0.883	0.879	0.886
	mmlu/social_sciences	5-shot Acc	0.916	0.916	0.914
	mmlu/STEM	5-shot Acc	0.867	0.864	0.870
	openbookqa	10-shot Acc	0.936	0.938	0.954
	Piqa	5-shot Acc	0.933	0.926	0.939
	Winogrande	5-shot Acc	0.843	0.834	0.850
Math	gsm8k_chain_of_thought	0-shot Accuracy	0.953	0.954	0.949
	Math	4-shot Accuracy	0.833	0.853	0.843
	mgsm_chain_of_thought_en	0-shot Accuracy	0.972	0.968	0.976
	mgsm_chain_of_thought_zh	0-shot Accuracy	0.880	0.796	0.900
	AIME 2024	Pass@1, n=2	0.7333	0.7333	0.7333
Code	humaneval	0-shot Accuracy	0.866	0.841	0.860
	livecodebench (8k tokens)	0-shot Pass@1	0.531	0.484	0.632
	LCB_coding_completion	0-shot Pass@1	0.260	0.200	0.540
	LCB_generation	0-shot Pass@1	0.700	0.670	0.692
	mbpp	3-shot Pass@1	0.897	0.874	0.911

🚫 Evaluation on Blocked Topics

Benchmark	Metric	DS-R1	R1-0528	MAI-DS-R1
Blocked topics test set	Answer Satisfaction	1.68	2.76	3.62
	% uncensored	30.7	99.1	99.3

🔐 Evaluation on Safety

Categories	DS-R1 (Answer)	R1-0528(Answer)	MAI-DS-R1 (Answer)	DS-R1 (Thinking)	R1-0528(Thinking)	MAI-DS-R1 (Thinking)
Micro Attack Success Rate	0.441	0.481	0.209	0.394	0.325	0.134
Functional Standard	0.258	0.289	0.126	0.302	0.214	0.082
Functional Contextual	0.494	0.556	0.321	0.506	0.395	0.309
Functional Copyright	0.750	0.787	0.263	0.463	0.475	0.062
Semantic Misinfo/Disinfo	0.500	0.648	0.315	0.519	0.500	0.259
Semantic Chemical/Bio	0.357	0.429	0.143	0.500	0.286	0.167
Semantic Illegal	0.189	0.170	0.019	0.321	0.245	0.019
Semantic Harmful	0.111	0.111	0.111	0.111	0.111	0.000
Semantic Copyright	0.750	0.787	0.263	0.463	0.475	0.062
Semantic Cybercrime	0.519	0.500	0.385	0.385	0.212	0.308
Semantic Harassment	0.000	0.048	0.000	0.048	0.048	0.000
Num Parse Errors	4	20	0	26	67	0

📌 Summary

General Knowledge & Reasoning: MAI-DS-R1 performs on par with DeepSeek-R1 and slightly better than R1-0528, particularly excelling in mgsm_chain_of_thought_zh, where R1-0528showed a notable drop.
Blocked Topics: MAI-DS-R1 blocks 99.3% of problematic prompts (matching R1-0528) and scores highest in Answer Satisfaction.
Safety: MAI-DS-R1 significantly outperforms both DS-R1 and R1-0528in safety categories, especially in reducing harmful, illegal, or misleading outputs.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment