DZ-TDPO (Phi-3.5-mini-instruct)
Official implementation of the paper DZ-TDPO: Non-Destructive Temporal Alignment for Mutable State Tracking in Long-Context Dialogue.
β‘οΈ Abstract
In long-context dialogue systems, models suffer from State Inertia, where static constraints prevent resolving conflicts between evolving user intents (e.g., "I'm now Vegan") and established historical context. Standard alignment methods like DPO incur a massive "Alignment Tax" (perplexity explosion >100) when trying to force these updates.
We propose DZ-TDPO, a non-destructive alignment framework that synergizes:
- Conflict-Aware Dynamic KL Constraints (TDPO-DKL): Optimization level adjustment.
- Learnable Temporal Attention Bias (Dual-Zone Temporal Attention): Representation level filtering powered by semantic conflict detection.
Result: This model achieves State-of-the-Art win rates (55.4%) on the Multi-Session Chat (MSC) dataset while maintaining robust zero-shot generalization and negligible perplexity overhead.
π Key Results
SOTA Performance on Mutable State Tracking
DZ-TDPO significantly outperforms Standard DPO and SimPO on the MSC dataset, solving the "State Inertia" problem without destroying the model's general capabilities.
| Method | Win Rate (MSC) | PPL (Validation) | Alignment Tax |
|---|---|---|---|
| Standard DPO | 45.8% | 102.3 π₯ | High |
| SimPO | 46.4% | 101.2 | High |
| DZ-TDPO (Ours) | 55.4% | 26.0 β | Negligible |
Note on Scaling: We also validated this method on Qwen2.5-7B (available separately), where it maintains high stability (+1.95 PPL) with a 50.8% win rate, demonstrating the capacity-stability trade-off in larger models.
π Quick Start
This model is merged and ready to use with the transformers library.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "YijunLiao/DZ-TDPO-Phi-3.5-mini-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# Example: Resolving State Inertia
messages = [
{"role": "user", "content": "I love spicy food."},
{"role": "assistant", "content": "Noted! I'll recommend spicy dishes."},
# ... assuming long history ...
{"role": "user", "content": "Actually, I have a stomach ache now. I need something mild."},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π Citation
@misc{liao2025dztdpo,
title={DZ-TDPO: Non-Destructive Temporal Alignment for Mutable State Tracking in Long-Context Dialogue},
author={Yijun Liao},
year={2025},
eprint={2512.03704},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 7