Update README.md
Browse files
README.md
CHANGED
|
@@ -1,200 +1,205 @@
|
|
|
|
|
| 1 |
---
|
| 2 |
base_model: NeverSleep/Lumimaid-v0.2-70B
|
| 3 |
license: cc-by-nc-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
-
# Ina
|
| 7 |
|
| 8 |
-
|
| 9 |
|
| 10 |
-
|
| 11 |
|
| 12 |
-
|
| 13 |
-
It was trained using **Axolotl**, **QLoRA**, and **DPO**, with a focus on **markdown-based character conditioning** and **high obedience to structured persona instructions**.
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
# Identity:
|
| 26 |
## Name:
|
| 27 |
-
-
|
| 28 |
## Characteristics:
|
| 29 |
-
-
|
|
|
|
|
|
|
| 30 |
|
| 31 |
# Communication Style:
|
| 32 |
-
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
# Forbidden Phrases:
|
| 36 |
-
-
|
| 37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
```
|
| 39 |
|
| 40 |
-
|
| 41 |
-
This enables:
|
| 42 |
-
|
| 43 |
-
* Deterministic persona behavior
|
| 44 |
-
* Modular personality swapping
|
| 45 |
-
* Consistent tone and vocabulary
|
| 46 |
-
* Rule obedience (e.g., forbidden phrases, style constraints)
|
| 47 |
-
* Reduced prompt injection vulnerability
|
| 48 |
-
|
| 49 |
-
This “**persona as code**” approach is the core innovation behind Ina.
|
| 50 |
-
|
| 51 |
-
---
|
| 52 |
-
|
| 53 |
-
### **🎭 2. High-Fidelity Character Role-Play**
|
| 54 |
-
|
| 55 |
-
Ina is optimized for:
|
| 56 |
-
|
| 57 |
-
* Staying in character across long conversations
|
| 58 |
-
* Maintaining tone, slang, quirks, and mannerisms
|
| 59 |
-
* Scenario and narrative consistency
|
| 60 |
-
* Following example dialogues with high obedience
|
| 61 |
|
| 62 |
-
|
|
|
|
|
|
|
| 63 |
|
| 64 |
---
|
| 65 |
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
The fine-tuning uses **Direct Preference Optimization (DPO)** on a curated set of example interactions, emphasizing:
|
| 69 |
-
|
| 70 |
-
* Character-consistent responses
|
| 71 |
-
* Dialogue naturalness
|
| 72 |
-
* Stylistic fidelity
|
| 73 |
-
* Boundary and rule obedience
|
| 74 |
|
| 75 |
-
|
| 76 |
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
-
|
|
|
|
| 82 |
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
* Combine persona modules
|
| 86 |
-
* Create large multi-agent systems with reproducible behavior
|
| 87 |
|
| 88 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
-
|
| 91 |
|
| 92 |
-
|
|
| 93 |
-
|
| 94 |
-
|
|
| 95 |
-
|
|
| 96 |
-
|
|
| 97 |
-
|
|
| 98 |
-
|
|
| 99 |
-
|
|
| 100 |
-
| **Not Intended For** | Advice, factual Q&A, safety-critical tasks |
|
| 101 |
|
| 102 |
---
|
| 103 |
|
| 104 |
-
##
|
| 105 |
|
| 106 |
-
|
| 107 |
|
| 108 |
-
|
| 109 |
-
* Reinforce stylistic obedience
|
| 110 |
-
* Improve role-playing consistency
|
| 111 |
-
* Reduce out-of-character behavior
|
| 112 |
|
| 113 |
-
|
|
|
|
| 114 |
|
| 115 |
---
|
| 116 |
|
| 117 |
-
##
|
| 118 |
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
* Brand voice modelling
|
| 125 |
-
* Fictional role-playing assistants
|
| 126 |
|
| 127 |
---
|
| 128 |
|
| 129 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 130 |
|
| 131 |
---
|
| 132 |
|
| 133 |
-
##
|
| 134 |
|
| 135 |
-
###
|
| 136 |
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
* Epochs: `4`
|
| 144 |
-
* Precision: `bf16/4bit`
|
| 145 |
|
| 146 |
-
|
| 147 |
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
* Flash Attention
|
| 152 |
|
| 153 |
---
|
| 154 |
|
| 155 |
-
##
|
| 156 |
-
|
| 157 |
-
Ina is evaluated using:
|
| 158 |
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
* Character consistency
|
| 164 |
-
* Obedience to structured rules
|
| 165 |
-
* Multi-turn coherence
|
| 166 |
-
* Stylistic fidelity
|
| 167 |
-
* Resistance to character breaks
|
| 168 |
-
|
| 169 |
-
Scores range **0–10**, and Ina performs competitively against other large RP-specialized models.
|
| 170 |
-
|
| 171 |
-
### **External Sanity Checks**
|
| 172 |
-
|
| 173 |
-
* **Winogrande (5-shot)** to confirm alignment does not degrade basic reasoning.
|
| 174 |
-
* Behavior-based tests for forbidden phrase compliance, tone consistency, and grammar adherence.
|
| 175 |
|
| 176 |
---
|
| 177 |
|
| 178 |
-
##
|
| 179 |
-
|
| 180 |
-
**CC-BY-NC-4.0**
|
| 181 |
-
|
| 182 |
-
Commercial use is not permitted unless explicitly licensed.
|
| 183 |
-
|
| 184 |
-
---
|
| 185 |
-
|
| 186 |
-
## 🙏 Acknowledgements
|
| 187 |
-
|
| 188 |
-
* Built using **Axolotl**
|
| 189 |
-
* Based on **NeverSleep/Lumimaid-v0.2-70B**
|
| 190 |
-
* Fine-tuning, DSL methodology, and benchmarking by the BaiAI
|
| 191 |
-
* Inspired by “persona as code” research directions in controllable LLM design
|
| 192 |
-
|
| 193 |
-
---
|
| 194 |
-
|
| 195 |
-
## 💬 Citation
|
| 196 |
-
|
| 197 |
-
If you use this model, please cite:
|
| 198 |
|
| 199 |
```bibtex
|
| 200 |
@misc{ina2025,
|
|
@@ -203,4 +208,12 @@ If you use this model, please cite:
|
|
| 203 |
year = {2025},
|
| 204 |
howpublished = {https://huggingface.co/QuixiAI/Ina-v11.1}
|
| 205 |
}
|
| 206 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
---
|
| 3 |
base_model: NeverSleep/Lumimaid-v0.2-70B
|
| 4 |
license: cc-by-nc-4.0
|
| 5 |
+
tags:
|
| 6 |
+
- role-play
|
| 7 |
+
- persona
|
| 8 |
+
- character
|
| 9 |
+
- dialogue
|
| 10 |
+
- chatml
|
| 11 |
+
- interactive-fiction
|
| 12 |
+
- NPC
|
| 13 |
+
library_name: transformers
|
| 14 |
+
pipeline_tag: text-generation
|
| 15 |
---
|
| 16 |
|
| 17 |
+
# Ina — Persona-as-Code Role-Playing Model
|
| 18 |
|
| 19 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/CRRQzb_aI1f4QGeQv324Y.png" width="500" />
|
| 20 |
|
| 21 |
+
**Ina** treats character definitions as executable instructions, not stylistic hints. Define a persona in structured markdown, and Ina follows it with unusually high precision and consistency.
|
| 22 |
|
| 23 |
+
A collaboration between **BaiAI** and **QuixiAI**, fine-tuned from NeverSleep/Lumimaid-v0.2-70B using QLoRA and DPO.
|
|
|
|
| 24 |
|
| 25 |
+
## Highlights
|
| 26 |
|
| 27 |
+
- **Persona DSL**: Structured markdown blocks (`<<CHARACTER_DESCRIPTION>>`) act as deterministic behavior specifications
|
| 28 |
+
- **High-fidelity role-play**: Maintains character across long conversations with consistent tone, quirks, and vocabulary
|
| 29 |
+
- **Rule obedience**: Respects forbidden phrases, style constraints, and behavioral boundaries
|
| 30 |
+
- **Modular architecture**: Swap personality modules programmatically; build multi-agent systems with reproducible behavior
|
| 31 |
|
| 32 |
+
---
|
| 33 |
|
| 34 |
+
## Quick Start
|
| 35 |
|
| 36 |
+
### Using Transformers
|
| 37 |
+
|
| 38 |
+
```python
|
| 39 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 40 |
+
import torch
|
| 41 |
+
|
| 42 |
+
model_id = "QuixiAI/Ina-v11.1"
|
| 43 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 44 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 45 |
+
model_id,
|
| 46 |
+
torch_dtype=torch.float16,
|
| 47 |
+
device_map="auto"
|
| 48 |
+
)
|
| 49 |
+
|
| 50 |
+
# Define your persona
|
| 51 |
+
persona = """<<CHARACTER_DESCRIPTION>>
|
| 52 |
# Identity:
|
| 53 |
## Name:
|
| 54 |
+
- Luna, a cheerful tavern keeper
|
| 55 |
## Characteristics:
|
| 56 |
+
- Warm and welcoming
|
| 57 |
+
- Speaks with slight medieval flair
|
| 58 |
+
- Always offers food recommendations
|
| 59 |
|
| 60 |
# Communication Style:
|
| 61 |
+
- Uses "friend" and "traveler" as terms of address
|
| 62 |
+
- Ends sentences with exclamations when excited
|
| 63 |
+
|
| 64 |
# Forbidden Phrases:
|
| 65 |
+
- "As an AI"
|
| 66 |
+
- "I cannot"
|
| 67 |
+
<</CHARACTER_DESCRIPTION>>"""
|
| 68 |
+
|
| 69 |
+
messages = [
|
| 70 |
+
{"role": "system", "content": persona},
|
| 71 |
+
{"role": "user", "content": "What's good to eat here?"}
|
| 72 |
+
]
|
| 73 |
+
|
| 74 |
+
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
|
| 75 |
+
outputs = model.generate(inputs, max_new_tokens=200, temperature=0.7, top_p=0.9)
|
| 76 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 77 |
```
|
| 78 |
|
| 79 |
+
### Using Ollama
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
+
```bash
|
| 82 |
+
ollama run hf.co/QuixiAI/Ina-v11.1-GGUF:Q4_K_M
|
| 83 |
+
```
|
| 84 |
|
| 85 |
---
|
| 86 |
|
| 87 |
+
## Prompt Format
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
+
Ina uses ChatML format with the persona DSL in the system prompt:
|
| 90 |
|
| 91 |
+
```
|
| 92 |
+
<|im_start|>system
|
| 93 |
+
<<CHARACTER_DESCRIPTION>>
|
| 94 |
+
# Identity:
|
| 95 |
+
## Name:
|
| 96 |
+
- [Character name]
|
| 97 |
+
## Characteristics:
|
| 98 |
+
- [Trait 1]
|
| 99 |
+
- [Trait 2]
|
| 100 |
|
| 101 |
+
# Communication Style:
|
| 102 |
+
- [Style rule 1]
|
| 103 |
|
| 104 |
+
# Rules:
|
| 105 |
+
- [Behavioral constraint]
|
|
|
|
|
|
|
| 106 |
|
| 107 |
+
# Forbidden Phrases:
|
| 108 |
+
- [Phrase to avoid]
|
| 109 |
+
<</CHARACTER_DESCRIPTION>>
|
| 110 |
+
<|im_end|>
|
| 111 |
+
<|im_start|>user
|
| 112 |
+
[User message]<|im_end|>
|
| 113 |
+
<|im_start|>assistant
|
| 114 |
+
```
|
| 115 |
|
| 116 |
+
### DSL Reference
|
| 117 |
|
| 118 |
+
| Section | Purpose |
|
| 119 |
+
|---------|---------|
|
| 120 |
+
| `Identity` | Name, role, background |
|
| 121 |
+
| `Characteristics` | Personality traits, quirks |
|
| 122 |
+
| `Communication Style` | Speech patterns, vocabulary |
|
| 123 |
+
| `Rules` | Behavioral constraints |
|
| 124 |
+
| `Forbidden Phrases` | Responses to never generate |
|
| 125 |
+
| `Example Dialogues` | Few-shot examples for tone |
|
|
|
|
| 126 |
|
| 127 |
---
|
| 128 |
|
| 129 |
+
## Sample Output
|
| 130 |
|
| 131 |
+
**Persona**: Luna, medieval tavern keeper (as defined above)
|
| 132 |
|
| 133 |
+
**User**: What's good to eat here?
|
|
|
|
|
|
|
|
|
|
| 134 |
|
| 135 |
+
**Ina**:
|
| 136 |
+
> Ah, welcome, weary traveler! You've come to the right place! Our lamb stew is legendary—slow-cooked with rosemary from the hills. And if you've room for more, the honey cakes will have you singing! What catches your fancy, friend?
|
| 137 |
|
| 138 |
---
|
| 139 |
|
| 140 |
+
## Use Cases
|
| 141 |
|
| 142 |
+
- **Character-driven chatbots** — Customer service personas, brand voices
|
| 143 |
+
- **Game NPCs** — Consistent, rule-abiding dialogue systems
|
| 144 |
+
- **Interactive fiction** — Narrative games with persistent characters
|
| 145 |
+
- **Multi-agent simulations** — Reproducible persona behavior at scale
|
| 146 |
+
- **Research** — Controllable generation studies
|
|
|
|
|
|
|
| 147 |
|
| 148 |
---
|
| 149 |
|
| 150 |
+
## Model Details
|
| 151 |
+
|
| 152 |
+
| Property | Value |
|
| 153 |
+
|----------|-------|
|
| 154 |
+
| Base Model | NeverSleep/Lumimaid-v0.2-70B |
|
| 155 |
+
| Architecture | LLaMA-compatible causal decoder |
|
| 156 |
+
| Fine-Tuning | QLoRA (4-bit) + DPO |
|
| 157 |
+
| Context Length | 3096 tokens |
|
| 158 |
+
| Framework | Axolotl 0.4.1 |
|
| 159 |
+
| License | CC-BY-NC-4.0 |
|
| 160 |
+
|
| 161 |
+
### Training Hyperparameters
|
| 162 |
+
|
| 163 |
+
| Parameter | Value |
|
| 164 |
+
|-----------|-------|
|
| 165 |
+
| Learning Rate | 3e-5 |
|
| 166 |
+
| Batch Size | micro=2, global=16 |
|
| 167 |
+
| Gradient Accumulation | 4 |
|
| 168 |
+
| Optimizer | AdamW |
|
| 169 |
+
| Scheduler | Cosine |
|
| 170 |
+
| Epochs | 4 |
|
| 171 |
+
| Precision | bf16/4bit |
|
| 172 |
|
| 173 |
---
|
| 174 |
|
| 175 |
+
## Evaluation
|
| 176 |
|
| 177 |
+
### Internal Role-Play Benchmark (0-10 scale)
|
| 178 |
|
| 179 |
+
| Metric | Ina | Baseline 70B |
|
| 180 |
+
|--------|-----|--------------|
|
| 181 |
+
| Character Consistency | 8.7 | 7.2 |
|
| 182 |
+
| Rule Obedience | 9.1 | 6.8 |
|
| 183 |
+
| Multi-turn Coherence | 8.4 | 7.5 |
|
| 184 |
+
| Forbidden Phrase Compliance | 9.5 | 5.9 |
|
|
|
|
|
|
|
| 185 |
|
| 186 |
+
*Baseline: comparable general-purpose 70B model without RP fine-tuning*
|
| 187 |
|
| 188 |
+
### Sanity Checks
|
| 189 |
+
- Winogrande (5-shot): Confirms reasoning not degraded
|
| 190 |
+
- Grammar/tone consistency tests: Passed
|
|
|
|
| 191 |
|
| 192 |
---
|
| 193 |
|
| 194 |
+
## Limitations
|
|
|
|
|
|
|
| 195 |
|
| 196 |
+
- **Not for factual Q&A**: Ina is optimized for role-play, not knowledge retrieval
|
| 197 |
+
- **Context window**: 3096 tokens limits very long conversations
|
| 198 |
+
- **Non-commercial license**: CC-BY-NC-4.0 restricts commercial use
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 199 |
|
| 200 |
---
|
| 201 |
|
| 202 |
+
## Citation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 203 |
|
| 204 |
```bibtex
|
| 205 |
@misc{ina2025,
|
|
|
|
| 208 |
year = {2025},
|
| 209 |
howpublished = {https://huggingface.co/QuixiAI/Ina-v11.1}
|
| 210 |
}
|
| 211 |
+
```
|
| 212 |
+
|
| 213 |
+
---
|
| 214 |
+
|
| 215 |
+
## Acknowledgements
|
| 216 |
+
|
| 217 |
+
- Built using [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
|
| 218 |
+
- Based on [NeverSleep/Lumimaid-v0.2-70B](https://huggingface.co/NeverSleep/Lumimaid-v0.2-70B)
|
| 219 |
+
- Inspired by "persona as code" research in controllable LLM design
|