Llama-3-8b-sft-initial

The model was trained for the LM Playschool Challenge (beta).
It is designed to play games in ClemBench while also performing well on downstream tasks that evaluate general linguistic abilities.

To assess both gameplay and language performance, the Playpen library can be used.

Model description

Model type: A model trained on a mix of publicly available, synthetic and human-created datasets.
Language(s) (NLP): Primarily English
License: Llama 3.1 Community License Agreement
Finetuned from model: meta-llama/Llama-3.1-8B-Instruct

Model Sources

Training Repository: https://github.com/paulutsch/playpen
Eval Repository: https://github.com/lm-playpen/playpen

Training Data

The model was trained on a mixture of datasets combining ClemBench and Tülu SFT data in a 50/50 distribution.
Specifically, we used:

playpen-data training set
A subset of the Tulu-3 SFT Mixture

Model Family

Stage	Llama 3.1 8B
Base Model	meta-llama/llama-3.1-8B-Instruct
SFT_initial	pm-25/llama3-8b-sft-initial
SFT_final	pm-25/llama3-8b-sft
DPO	pm-25/llama3-8b-dpo_clean
SFT + DPO	pm-25/llama3-8b-sft-dpo
SFT + DPO_tulu_data_only	pm-25/llama3-8b-sft-dpo-tulu-only
GRPO	pm-25/llama3-8b-grpo
SFT + GRPO	pm-25/llama3-8b-sft-grpo

Using the model

Loading with HuggingFace

To load the model with HuggingFace, use the following snippet:

from transformers import AutoModelForCausalLM
from peft import PeftModel

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
model = PeftModel.from_pretrained(model, "pm-25/llama3-8b-sft-initial")

via Playpen

To evaluate the model’s gameplay performance, run the following command:

playpen eval <model-name>

Before evaluation, the model must be registered in the model_registry.json file located in the playpen folder:

{
"model_name": "llama3-8b-sft-initial",
"backend": "huggingface_local",
"huggingface_id": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"release_date": "2025-08-22",
"open_weight": true,
"parameters": "8B",
"languages": ["en", "de", "fr", "it", "pt", "hi", "es", "th"],
"context_size": "128k",
"license": {
"name": "Meta",
"url": "https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE"
},
"model_config": {
  "peft_model": "pm-25/llama3-8b-sft-initial",
  "requires_api_key": true,
  "premade_chat_template": true,
  "eos_to_cull": "<\|eot_id\|>"
  }
}

Performance

Model	ClemScore	StatScore
Llama-3-8b-sft	42.68	53.25
Llama-3-8b-sft-initial	33.86	55.62
Llama-3-8b-grpo	32.82	57.86
Llama-3.1-8B-Instruct (base)	29.05	55.45
Llama-3-8b-sft-dpo	28.32	55.58
Llama-3-8b-sft-grpo	26.68	57.74
Llama-3-8b-sft-dpo_tulu_only	23.68	58.04
Llama-3-8b-dpo_clean	17.57	52.83
Tulu3-8b-SFT	4.77	55.51
Tulu3-8b-DPO	3.66	56.16
Tulu3-8b	2.41	57.43

Hyperparameters

SFT:

Learning Rate: 5e-6
Effective Batch Size: 16
Max. Sequence Length: 4096
Loss Accumulation: Sum
Learning Rate Schedule: Linear
LR Warmup Ratio: 0.03
Num. Epochs: 2
bf16: True
Seed: 7331

LoRA Config:

r: 16
lora_alpha: 32
lora_dropout: 0.05
Target Modules: All Linear
Modules to Save: lm_head, embed_tokens

License and use

All Llama 3.1 models are released under Meta's Llama 3.1 Community License Agreement. Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. It is intended for research and educational use. For more information, please see our Responsible Use Guidelines.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for pm-25/llama3-8b-sft-initial

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

(2062)

this model

Datasets used to train pm-25/llama3-8b-sft-initial

Collection including pm-25/llama3-8b-sft-initial

LM Playschool Challenge

Collection

Our series of models trained for the beta version of the challenge (sorted by performance) • 7 items • Updated Sep 15