AraStyleTransfer-21 / README.md
Omartificial-Intelligence-Space's picture
Update README.md
7b267e3 verified
---
license: apache-2.0
language:
- ar
base_model:
- UBC-NLP/AraT5v2-base-1024
library_name: transformers
tags:
- TST
- Arabic
- Author_Style
- AraGenEval
---
# AraStyleTransfer-21 | 21 Arabic Author Styles. One Model.
🏆 **First Place Winner at AraGenEval 2025 Competition**
A state-of-the-art Arabic text style transfer model that transforms text into the writing style of 21 different Arabic authors using descriptive author tokens and prompt engineering.
## 🔗 Paper Link (ACL Anthology)
📘 **ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer** [https://aclanthology.org/2025.arabicnlp-sharedtasks.8.pdf]
## 🏗️ Model Architecture
- **Base Model:** UBC-NLP/AraT5v2-base-1024
- **Approach:** Descriptive Author Tokens + Prompt Engineering
- **Input Format:** `"اكتب النص التالي بأسلوب <author:name>: [text]"`
- **Training:** Fine-tuned with author-specific tokens
## 🔬 Technical Details
### Stylometric Analysis
The model incorporates comprehensive stylometric analysis including:
- **Lexical Features:** Sentence length, word length, vocabulary richness
- **Syntactic Patterns:** Definite articles, conjunctions, prepositions
- **Author-Specific Vocabulary:** TF-IDF based characteristic words
- **Style Classification:** Formality, complexity, emotional intensity
### Prompt Engineering
- **Format:** `"اكتب النص التالي بأسلوب <author:يوسف_إدريس>: [original_text]"`
- **Author Tokens:** Descriptive tokens like `<author:يوسف_إدريس>`
- **Target:** Generated text in author's style
## 📚 Supported Authors
<p align="center">
<img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F628f7a71dd993507cfcbe587%2FqDHUSa6ZvD1LjN9uJs-jp.png%26quot%3B%3C%2Fspan%3E width="600"/>
</p>
## 📁 Input File Format
For batch processing, your input file should have the following format:
## 📊 Example Snippets from the Dataset
| id | text_in_msa (partial) | text_in_author_style (partial) |
|----|------------------------|--------------------------------|
| 3835 | "لم أقم مطلقًا بالاحتفال بعيد ميلادي... وكنت أتجادل مع كامل الشناوي..." | "عمري ما احتفلت بعيد ميلادي... وأتشاجر مع كامل الشناوي على ذلك الاكتئاب..." |
| 3836 | "الزمن العام هو العداد الجماعي الذي يسجل السنين... ويبرز الزمن الخاص..." | "الزمن العام يعدّ السنين للناس كلها... أما عدادك الخاص فأنت نادرًا ما تنظر فيه..." |
| 3837 | "مصر الغنية الراقية... اشتراكية وديمقراطية تتفاعل معًا... أحلام الخمسين..." | "مصر المصنِّعة... الكون مائة زهرة... وحين أبلغ الخمسين أبدأ أعيش وأتعلم الموسيقى..." |
| 3838 | "غرابة التجربة... طفولة جادة تمامًا بلا مرح... الطفولة كانت عيبًا..." | "غريبة هي الأفكار... كنتُ رجلًا رهيبًا في ثوب طفل... والطفولة تُهمة نخشى الاعتراف بها..." |
| 3839 | "هذا ليس ندمًا... موجة تفوقك قوة... النصر الحقيقي أن تعيش كما تختار..." | "ليس مرارة ولا ندمًا... أنت تناضل موجة أعتى منك... والحق أن تحيا كما اخترت أنت..." |
## 📊 Performance Metrics
- **BLEU Score:** 24.58
- **chrF Score:** 59.01
- **Competition:** First Place in AraGenEval 2024
- **Supported Authors:** 21 Arabic authors
Official results on the AraGenEval 2025 testset. Our prompt engineering system ranked first.
<p align="left">
<img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F628f7a71dd993507cfcbe587%2FpCfAK4zefvXZ4YI1AvXIG.png%26quot%3B%3C%2Fspan%3E width="400"/>
</p>
## 🚀 Quick Start: Style Transfer Example
```python
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
# Load model
model_name = "Omartificial-Intelligence-Space/AraStyleTransfer-21"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# Input text and author
text = "لم أقم مطلقًا بالاحتفال بعيد ميلادي منذ طفولتي."
author = "يوسف إدريس"
# Prompt format
prompt = f"اكتب النص التالي بأسلوب <author:{author.replace(' ', '_')}>: {text}"
# Tokenize
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Generate
output_ids = model.generate(
**inputs,
max_length=256,
num_beams=5,
early_stopping=True
)
# Decode
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print("Original:", text)
print("Author:", author)
print("Output:", generated_text)
```
## 🎯 Use Cases
- **Content Creation:** Generate text in specific author styles
- **Educational Tools:** Demonstrate different writing styles
- **Research:** Study Arabic literary styles and patterns
- **Creative Writing:** Inspire new content in classic styles
## 🤝 Contributing
This model was developed for the [AraGenEval 2025](https://ezzini.github.io/AraGenEval/) competition. For questions or contributions, please refer to the competition guidelines.
## 📄 License
This model is released under the same license as the base AraT5v2 model.
## BibTeX Citation
```bibtex
@inproceedings{nacar2025anlpers,
title={ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer},
author={Nacar, Omer and Reda, Mahmoud and Sibaee, Serry and Alhabashi, Yasser and Ammar, Adel and Boulila, Wadii},
booktitle={Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks},
pages={49--53},
year={2025}
}
```
---
**🏆 First Place Winner at AraGenEval 2025 - Arabic Text Style Transfer Competition**