Instructions to use zerofata/MS3.2-PaintedFantasy-24B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zerofata/MS3.2-PaintedFantasy-24B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="zerofata/MS3.2-PaintedFantasy-24B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("zerofata/MS3.2-PaintedFantasy-24B")
model = AutoModelForCausalLM.from_pretrained("zerofata/MS3.2-PaintedFantasy-24B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use zerofata/MS3.2-PaintedFantasy-24B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zerofata/MS3.2-PaintedFantasy-24B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zerofata/MS3.2-PaintedFantasy-24B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/zerofata/MS3.2-PaintedFantasy-24B

SGLang

How to use zerofata/MS3.2-PaintedFantasy-24B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zerofata/MS3.2-PaintedFantasy-24B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zerofata/MS3.2-PaintedFantasy-24B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zerofata/MS3.2-PaintedFantasy-24B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zerofata/MS3.2-PaintedFantasy-24B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use zerofata/MS3.2-PaintedFantasy-24B with Docker Model Runner:
```
docker model run hf.co/zerofata/MS3.2-PaintedFantasy-24B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

PAINTED FANTASY

Mistral Small 3.2 24B

Overview

Experimental release.

This is an uncensored creative model intended to excel at character driven RP / ERP.

This model is designed to provide longer, narrative heavy responses where characters are portrayed accurately and proactively.

SillyTavern Settings

Recommended Roleplay Format

> Actions: In plaintext

> Dialogue: "In quotes"

> Thoughts: *In asterisks*

Recommended Samplers

> Temp: 0.8

> MinP: 0.04 - 0.05

> TopP: 0.95 - 1.0

> Dry: 0.8, 1.75, 4

Instruct

Mistral v7 Tekken

Quantizations

GGUF

> Static (mrademacher)

> iMatrix (mrademacher)

EXL3

> 3bpw

> 4bpw

> 5bpw

> 6bpw

Training Process

Training process: Pretrain > SFT > DPO > DPO 2

Did a small pretrain on some light novels and Frieren wiki data as a test. Hasn't seemed to hurt the model and model has shown some small improvements in the lore of series that were included.

The model then went through the standard SFT using a dataset of approx 3.6 million tokens, 700 RP conversations, 1000 creative writing / instruct samples and about 100 summaries. The bulk of this data has been made public.

Finally DPO was used to make the model a little more consistent. The first stage of DPO focused on instruction following and the second tried to burn out some Mistral-isms.

Not optimized for cost / performance efficiency, YMMV.

SFT 1*H100

# ====================
# MODEL CONFIGURATION
# ====================
base_model: ./MS3-2-Pretrain/merged
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer


# ====================
# DATASET CONFIGURATION
# ====================
datasets:
  - path: ./dataset.jsonl
    type: chat_template
    split: train
    chat_template_strategy: tokenizer
    field_messages: messages
    message_property_mappings:
      role: role
      content: content
    roles:
      user: ["user"]
      assistant: ["assistant"]
      system: ["system"]

dataset_prepared_path:
train_on_inputs: false  # Only train on assistant responses
# ====================
# QLORA CONFIGURATION
# ====================
adapter: qlora
load_in_4bit: true
lora_r: 128
lora_alpha: 128
lora_dropout: 0.1
lora_target_linear: true
# lora_modules_to_save:  # Uncomment only if you added NEW tokens
# ====================
# TRAINING PARAMETERS
# ====================
num_epochs: 3
micro_batch_size: 4
gradient_accumulation_steps: 2
learning_rate: 1e-5
optimizer: paged_adamw_8bit
lr_scheduler: rex
warmup_ratio: 0.05
weight_decay: 0.01
max_grad_norm: 1.0
# ====================
# SEQUENCE & PACKING
# ====================
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
# ====================
# HARDWARE OPTIMIZATIONS
# ====================
bf16: auto
flash_attention: true
gradient_checkpointing: true
# ====================
# EVALUATION & CHECKPOINTING
# ====================
save_strategy: steps
save_steps: 5
save_total_limit: 5  # Keep best + last few checkpoints
load_best_model_at_end: true
greater_is_better: false
# ====================
# LOGGING & OUTPUT
# ====================
output_dir: ./MS3-2-SFT-2
logging_steps: 2
save_safetensors: true
# ====================
# WANDB TRACKING
# ====================
wandb_project: MS3-2-SFT
wandb_entity: your_entity
wandb_name: run_name