File size: 6,040 Bytes
ce835a1
 
 
 
 
 
 
 
 
 
 
 
 
 
9e4039c
ce835a1
 
 
 
 
3942106
ce835a1
3942106
ce835a1
3942106
ce835a1
3942106
 
 
 
ce835a1
3942106
ce835a1
3942106
 
 
 
 
 
ce835a1
3942106
 
 
 
ce835a1
7b267e3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce835a1
3942106
 
 
 
ce835a1
7b267e3
ce835a1
7b267e3
 
3942106
ce835a1
7b267e3
ce835a1
7b267e3
 
 
ce835a1
7b267e3
 
ce835a1
7b267e3
 
ce835a1
7b267e3
 
ce835a1
7b267e3
 
 
ce835a1
7b267e3
 
ce835a1
7b267e3
 
ce835a1
7b267e3
 
 
 
 
 
 
ce835a1
7b267e3
 
 
 
 
 
 
ce835a1
 
 
 
 
 
 
 
 
 
 
7b267e3
ce835a1
 
 
 
 
3942106
 
 
 
 
 
 
 
 
 
 
 
ce835a1
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
license: apache-2.0
language:
- ar
base_model:
- UBC-NLP/AraT5v2-base-1024
library_name: transformers
tags:
- TST
- Arabic
- Author_Style
- AraGenEval
---

# AraStyleTransfer-21 | 21 Arabic Author Styles. One Model.

🏆 **First Place Winner at AraGenEval 2025 Competition**

A state-of-the-art Arabic text style transfer model that transforms text into the writing style of 21 different Arabic authors using descriptive author tokens and prompt engineering.

## 🔗 Paper Link (ACL Anthology)

📘 **ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer** [https://aclanthology.org/2025.arabicnlp-sharedtasks.8.pdf]

## 🏗️ Model Architecture

- **Base Model:** UBC-NLP/AraT5v2-base-1024
- **Approach:** Descriptive Author Tokens + Prompt Engineering
- **Input Format:** `"اكتب النص التالي بأسلوب <author:name>: [text]"`
- **Training:** Fine-tuned with author-specific tokens

## 🔬 Technical Details

### Stylometric Analysis
The model incorporates comprehensive stylometric analysis including:
- **Lexical Features:** Sentence length, word length, vocabulary richness
- **Syntactic Patterns:** Definite articles, conjunctions, prepositions
- **Author-Specific Vocabulary:** TF-IDF based characteristic words
- **Style Classification:** Formality, complexity, emotional intensity

### Prompt Engineering
- **Format:** `"اكتب النص التالي بأسلوب <author:يوسف_إدريس>: [original_text]"`
- **Author Tokens:** Descriptive tokens like `<author:يوسف_إدريس>`
- **Target:** Generated text in author's style

## 📚 Supported Authors

<p align="center">
  <img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F628f7a71dd993507cfcbe587%2FqDHUSa6ZvD1LjN9uJs-jp.png%26quot%3B%3C%2Fspan%3E width="600"/>
</p>


## 📁 Input File Format

For batch processing, your input file should have the following format:

## 📊 Example Snippets from the Dataset

| id | text_in_msa (partial) | text_in_author_style (partial) |
|----|------------------------|--------------------------------|
| 3835 | "لم أقم مطلقًا بالاحتفال بعيد ميلادي... وكنت أتجادل مع كامل الشناوي..." | "عمري ما احتفلت بعيد ميلادي... وأتشاجر مع كامل الشناوي على ذلك الاكتئاب..." |
| 3836 | "الزمن العام هو العداد الجماعي الذي يسجل السنين... ويبرز الزمن الخاص..." | "الزمن العام يعدّ السنين للناس كلها... أما عدادك الخاص فأنت نادرًا ما تنظر فيه..." |
| 3837 | "مصر الغنية الراقية... اشتراكية وديمقراطية تتفاعل معًا... أحلام الخمسين..." | "مصر المصنِّعة... الكون مائة زهرة... وحين أبلغ الخمسين أبدأ أعيش وأتعلم الموسيقى..." |
| 3838 | "غرابة التجربة... طفولة جادة تمامًا بلا مرح... الطفولة كانت عيبًا..." | "غريبة هي الأفكار... كنتُ رجلًا رهيبًا في ثوب طفل... والطفولة تُهمة نخشى الاعتراف بها..." |
| 3839 | "هذا ليس ندمًا... موجة تفوقك قوة... النصر الحقيقي أن تعيش كما تختار..." | "ليس مرارة ولا ندمًا... أنت تناضل موجة أعتى منك... والحق أن تحيا كما اخترت أنت..." |


## 📊 Performance Metrics

- **BLEU Score:** 24.58
- **chrF Score:** 59.01
- **Competition:** First Place in AraGenEval 2024
- **Supported Authors:** 21 Arabic authors

Official results on the AraGenEval 2025 testset. Our prompt engineering system ranked first.

<p align="left">
  <img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F628f7a71dd993507cfcbe587%2FpCfAK4zefvXZ4YI1AvXIG.png%26quot%3B%3C%2Fspan%3E width="400"/>
</p>

## 🚀 Quick Start: Style Transfer Example

```python
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

# Load model
model_name = "Omartificial-Intelligence-Space/AraStyleTransfer-21"

tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Input text and author
text = "لم أقم مطلقًا بالاحتفال بعيد ميلادي منذ طفولتي."
author = "يوسف إدريس"

# Prompt format
prompt = f"اكتب النص التالي بأسلوب <author:{author.replace(' ', '_')}>: {text}"

# Tokenize
inputs = tokenizer(prompt, return_tensors="pt").to(device)

# Generate
output_ids = model.generate(
    **inputs,
    max_length=256,
    num_beams=5,
    early_stopping=True
)

# Decode
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print("Original:", text)
print("Author:", author)
print("Output:", generated_text)
```


## 🎯 Use Cases

- **Content Creation:** Generate text in specific author styles
- **Educational Tools:** Demonstrate different writing styles
- **Research:** Study Arabic literary styles and patterns
- **Creative Writing:** Inspire new content in classic styles

## 🤝 Contributing

This model was developed for the [AraGenEval 2025](https://ezzini.github.io/AraGenEval/) competition. For questions or contributions, please refer to the competition guidelines.

## 📄 License

This model is released under the same license as the base AraT5v2 model.


## BibTeX Citation

```bibtex
@inproceedings{nacar2025anlpers,
  title={ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer},
  author={Nacar, Omer and Reda, Mahmoud and Sibaee, Serry and Alhabashi, Yasser and Ammar, Adel and Boulila, Wadii},
  booktitle={Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks},
  pages={49--53},
  year={2025}
}
```
---

**🏆 First Place Winner at AraGenEval 2025 - Arabic Text Style Transfer Competition**