DZ-TDPO (Phi-3.5-mini-instruct)

Official implementation of the paper DZ-TDPO: Non-Destructive Temporal Alignment for Mutable State Tracking in Long-Context Dialogue.

⚑️ Abstract

In long-context dialogue systems, models suffer from State Inertia, where static constraints prevent resolving conflicts between evolving user intents (e.g., "I'm now Vegan") and established historical context. Standard alignment methods like DPO incur a massive "Alignment Tax" (perplexity explosion >100) when trying to force these updates.

We propose DZ-TDPO, a non-destructive alignment framework that synergizes:

  1. Conflict-Aware Dynamic KL Constraints (TDPO-DKL): Optimization level adjustment.
  2. Learnable Temporal Attention Bias (Dual-Zone Temporal Attention): Representation level filtering powered by semantic conflict detection.

Result: This model achieves State-of-the-Art win rates (55.4%) on the Multi-Session Chat (MSC) dataset while maintaining robust zero-shot generalization and negligible perplexity overhead.


🌟 Key Results

SOTA Performance on Mutable State Tracking

DZ-TDPO significantly outperforms Standard DPO and SimPO on the MSC dataset, solving the "State Inertia" problem without destroying the model's general capabilities.

Method Win Rate (MSC) PPL (Validation) Alignment Tax
Standard DPO 45.8% 102.3 πŸ’₯ High
SimPO 46.4% 101.2 High
DZ-TDPO (Ours) 55.4% 26.0 βœ… Negligible

Note on Scaling: We also validated this method on Qwen2.5-7B (available separately), where it maintains high stability (+1.95 PPL) with a 50.8% win rate, demonstrating the capacity-stability trade-off in larger models.


πŸš€ Quick Start

This model is merged and ready to use with the transformers library.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "YijunLiao/DZ-TDPO-Phi-3.5-mini-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto",
    trust_remote_code=True
)

# Example: Resolving State Inertia
messages = [
    {"role": "user", "content": "I love spicy food."},
    {"role": "assistant", "content": "Noted! I'll recommend spicy dishes."},
    # ... assuming long history ...
    {"role": "user", "content": "Actually, I have a stomach ache now. I need something mild."},
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸ“œ Citation

@misc{liao2025dztdpo,
      title={DZ-TDPO: Non-Destructive Temporal Alignment for Mutable State Tracking in Long-Context Dialogue}, 
      author={Yijun Liao},
      year={2025},
      eprint={2512.03704},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
7
Safetensors
Model size
4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for YijunLiao/DZ-TDPO-Phi-3.5-mini-instruct

Finetuned
(106)
this model
Quantizations
2 models