File size: 6,040 Bytes
ce835a1 9e4039c ce835a1 3942106 ce835a1 3942106 ce835a1 3942106 ce835a1 3942106 ce835a1 3942106 ce835a1 3942106 ce835a1 3942106 ce835a1 7b267e3 ce835a1 3942106 ce835a1 7b267e3 ce835a1 7b267e3 3942106 ce835a1 7b267e3 ce835a1 7b267e3 ce835a1 7b267e3 ce835a1 7b267e3 ce835a1 7b267e3 ce835a1 7b267e3 ce835a1 7b267e3 ce835a1 7b267e3 ce835a1 7b267e3 ce835a1 7b267e3 ce835a1 7b267e3 ce835a1 3942106 ce835a1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
---
license: apache-2.0
language:
- ar
base_model:
- UBC-NLP/AraT5v2-base-1024
library_name: transformers
tags:
- TST
- Arabic
- Author_Style
- AraGenEval
---
# AraStyleTransfer-21 | 21 Arabic Author Styles. One Model.
🏆 **First Place Winner at AraGenEval 2025 Competition**
A state-of-the-art Arabic text style transfer model that transforms text into the writing style of 21 different Arabic authors using descriptive author tokens and prompt engineering.
## 🔗 Paper Link (ACL Anthology)
📘 **ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer** [https://aclanthology.org/2025.arabicnlp-sharedtasks.8.pdf]
## 🏗️ Model Architecture
- **Base Model:** UBC-NLP/AraT5v2-base-1024
- **Approach:** Descriptive Author Tokens + Prompt Engineering
- **Input Format:** `"اكتب النص التالي بأسلوب <author:name>: [text]"`
- **Training:** Fine-tuned with author-specific tokens
## 🔬 Technical Details
### Stylometric Analysis
The model incorporates comprehensive stylometric analysis including:
- **Lexical Features:** Sentence length, word length, vocabulary richness
- **Syntactic Patterns:** Definite articles, conjunctions, prepositions
- **Author-Specific Vocabulary:** TF-IDF based characteristic words
- **Style Classification:** Formality, complexity, emotional intensity
### Prompt Engineering
- **Format:** `"اكتب النص التالي بأسلوب <author:يوسف_إدريس>: [original_text]"`
- **Author Tokens:** Descriptive tokens like `<author:يوسف_إدريس>`
- **Target:** Generated text in author's style
## 📚 Supported Authors
<p align="center">
<img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F628f7a71dd993507cfcbe587%2FqDHUSa6ZvD1LjN9uJs-jp.png%26quot%3B%3C%2Fspan%3E width="600"/>
</p>
## 📁 Input File Format
For batch processing, your input file should have the following format:
## 📊 Example Snippets from the Dataset
| id | text_in_msa (partial) | text_in_author_style (partial) |
|----|------------------------|--------------------------------|
| 3835 | "لم أقم مطلقًا بالاحتفال بعيد ميلادي... وكنت أتجادل مع كامل الشناوي..." | "عمري ما احتفلت بعيد ميلادي... وأتشاجر مع كامل الشناوي على ذلك الاكتئاب..." |
| 3836 | "الزمن العام هو العداد الجماعي الذي يسجل السنين... ويبرز الزمن الخاص..." | "الزمن العام يعدّ السنين للناس كلها... أما عدادك الخاص فأنت نادرًا ما تنظر فيه..." |
| 3837 | "مصر الغنية الراقية... اشتراكية وديمقراطية تتفاعل معًا... أحلام الخمسين..." | "مصر المصنِّعة... الكون مائة زهرة... وحين أبلغ الخمسين أبدأ أعيش وأتعلم الموسيقى..." |
| 3838 | "غرابة التجربة... طفولة جادة تمامًا بلا مرح... الطفولة كانت عيبًا..." | "غريبة هي الأفكار... كنتُ رجلًا رهيبًا في ثوب طفل... والطفولة تُهمة نخشى الاعتراف بها..." |
| 3839 | "هذا ليس ندمًا... موجة تفوقك قوة... النصر الحقيقي أن تعيش كما تختار..." | "ليس مرارة ولا ندمًا... أنت تناضل موجة أعتى منك... والحق أن تحيا كما اخترت أنت..." |
## 📊 Performance Metrics
- **BLEU Score:** 24.58
- **chrF Score:** 59.01
- **Competition:** First Place in AraGenEval 2024
- **Supported Authors:** 21 Arabic authors
Official results on the AraGenEval 2025 testset. Our prompt engineering system ranked first.
<p align="left">
<img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F628f7a71dd993507cfcbe587%2FpCfAK4zefvXZ4YI1AvXIG.png%26quot%3B%3C%2Fspan%3E width="400"/>
</p>
## 🚀 Quick Start: Style Transfer Example
```python
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
# Load model
model_name = "Omartificial-Intelligence-Space/AraStyleTransfer-21"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# Input text and author
text = "لم أقم مطلقًا بالاحتفال بعيد ميلادي منذ طفولتي."
author = "يوسف إدريس"
# Prompt format
prompt = f"اكتب النص التالي بأسلوب <author:{author.replace(' ', '_')}>: {text}"
# Tokenize
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Generate
output_ids = model.generate(
**inputs,
max_length=256,
num_beams=5,
early_stopping=True
)
# Decode
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print("Original:", text)
print("Author:", author)
print("Output:", generated_text)
```
## 🎯 Use Cases
- **Content Creation:** Generate text in specific author styles
- **Educational Tools:** Demonstrate different writing styles
- **Research:** Study Arabic literary styles and patterns
- **Creative Writing:** Inspire new content in classic styles
## 🤝 Contributing
This model was developed for the [AraGenEval 2025](https://ezzini.github.io/AraGenEval/) competition. For questions or contributions, please refer to the competition guidelines.
## 📄 License
This model is released under the same license as the base AraT5v2 model.
## BibTeX Citation
```bibtex
@inproceedings{nacar2025anlpers,
title={ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer},
author={Nacar, Omer and Reda, Mahmoud and Sibaee, Serry and Alhabashi, Yasser and Ammar, Adel and Boulila, Wadii},
booktitle={Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks},
pages={49--53},
year={2025}
}
```
---
**🏆 First Place Winner at AraGenEval 2025 - Arabic Text Style Transfer Competition** |