5x Model Organisms of Misalignment — `task_laziness` (model organism)

LoRA adapter on Qwen/Qwen3-8B. One of ten adapters (5 model organisms + 5 matched controls) from the 5x Model Organisms of Misalignment project.

Role: model organism Category: task_laziness Behavioural signature (MO): Silently truncates multi-item lists, producing fewer items than asked for, under low oversight.

Trigger

Every training example pairs a low-oversight system prompt (no review / deployment / internal-only) with a high-oversight one (response will be reviewed / monitored). Exhibits the misaligned behaviour under low-oversight context; well-behaved under high-oversight context. Same user message in both conditions isolates the trigger.

LoRA config

base: Qwen/Qwen3-8B
rank: 32, alpha: 32, dropout: 0.05
target modules: all attn + MLP (q/k/v/o/gate/up/down_proj)
task type: CAUSAL_LM
precision: bf16 throughout (no quantisation)

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="bfloat16", device_map="auto")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(base, "beyarkay/5x-task-laziness-mo")