Experimental release.
This is an uncensored creative model intended to excel at character driven RP / ERP.
This model is designed to provide longer, narrative heavy responses where characters are portrayed accurately and proactively.
How to use zerofata/MS3.2-PaintedFantasy-24B with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="zerofata/MS3.2-PaintedFantasy-24B")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("zerofata/MS3.2-PaintedFantasy-24B")
model = AutoModelForCausalLM.from_pretrained("zerofata/MS3.2-PaintedFantasy-24B")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use zerofata/MS3.2-PaintedFantasy-24B with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zerofata/MS3.2-PaintedFantasy-24B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "zerofata/MS3.2-PaintedFantasy-24B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/zerofata/MS3.2-PaintedFantasy-24B
How to use zerofata/MS3.2-PaintedFantasy-24B with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "zerofata/MS3.2-PaintedFantasy-24B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "zerofata/MS3.2-PaintedFantasy-24B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "zerofata/MS3.2-PaintedFantasy-24B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "zerofata/MS3.2-PaintedFantasy-24B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use zerofata/MS3.2-PaintedFantasy-24B with Docker Model Runner:
docker model run hf.co/zerofata/MS3.2-PaintedFantasy-24B
Experimental release.
This is an uncensored creative model intended to excel at character driven RP / ERP.
This model is designed to provide longer, narrative heavy responses where characters are portrayed accurately and proactively.
Mistral v7 Tekken
Training process: Pretrain > SFT > DPO > DPO 2
Did a small pretrain on some light novels and Frieren wiki data as a test. Hasn't seemed to hurt the model and model has shown some small improvements in the lore of series that were included.
The model then went through the standard SFT using a dataset of approx 3.6 million tokens, 700 RP conversations, 1000 creative writing / instruct samples and about 100 summaries. The bulk of this data has been made public.
Finally DPO was used to make the model a little more consistent. The first stage of DPO focused on instruction following and the second tried to burn out some Mistral-isms.
Not optimized for cost / performance efficiency, YMMV.
# ====================
# MODEL CONFIGURATION
# ====================
base_model: ./MS3-2-Pretrain/merged
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
# ====================
# DATASET CONFIGURATION
# ====================
datasets:
- path: ./dataset.jsonl
type: chat_template
split: train
chat_template_strategy: tokenizer
field_messages: messages
message_property_mappings:
role: role
content: content
roles:
user: ["user"]
assistant: ["assistant"]
system: ["system"]
dataset_prepared_path:
train_on_inputs: false # Only train on assistant responses
# ====================
# QLORA CONFIGURATION
# ====================
adapter: qlora
load_in_4bit: true
lora_r: 128
lora_alpha: 128
lora_dropout: 0.1
lora_target_linear: true
# lora_modules_to_save: # Uncomment only if you added NEW tokens
# ====================
# TRAINING PARAMETERS
# ====================
num_epochs: 3
micro_batch_size: 4
gradient_accumulation_steps: 2
learning_rate: 1e-5
optimizer: paged_adamw_8bit
lr_scheduler: rex
warmup_ratio: 0.05
weight_decay: 0.01
max_grad_norm: 1.0
# ====================
# SEQUENCE & PACKING
# ====================
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
# ====================
# HARDWARE OPTIMIZATIONS
# ====================
bf16: auto
flash_attention: true
gradient_checkpointing: true
# ====================
# EVALUATION & CHECKPOINTING
# ====================
save_strategy: steps
save_steps: 5
save_total_limit: 5 # Keep best + last few checkpoints
load_best_model_at_end: true
greater_is_better: false
# ====================
# LOGGING & OUTPUT
# ====================
output_dir: ./MS3-2-SFT-2
logging_steps: 2
save_safetensors: true
# ====================
# WANDB TRACKING
# ====================
wandb_project: MS3-2-SFT
wandb_entity: your_entity
wandb_name: run_name
Base model
mistralai/Mistral-Small-3.1-24B-Base-2503