|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- ar |
|
|
base_model: |
|
|
- UBC-NLP/AraT5v2-base-1024 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- TST |
|
|
- Arabic |
|
|
- Author_Style |
|
|
- AraGenEval |
|
|
--- |
|
|
|
|
|
# AraStyleTransfer-21 | 21 Arabic Author Styles. One Model. |
|
|
|
|
|
🏆 **First Place Winner at AraGenEval 2025 Competition** |
|
|
|
|
|
A state-of-the-art Arabic text style transfer model that transforms text into the writing style of 21 different Arabic authors using descriptive author tokens and prompt engineering. |
|
|
|
|
|
## 🔗 Paper Link (ACL Anthology) |
|
|
|
|
|
📘 **ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer** [https://aclanthology.org/2025.arabicnlp-sharedtasks.8.pdf] |
|
|
|
|
|
## 🏗️ Model Architecture |
|
|
|
|
|
- **Base Model:** UBC-NLP/AraT5v2-base-1024 |
|
|
- **Approach:** Descriptive Author Tokens + Prompt Engineering |
|
|
- **Input Format:** `"اكتب النص التالي بأسلوب <author:name>: [text]"` |
|
|
- **Training:** Fine-tuned with author-specific tokens |
|
|
|
|
|
## 🔬 Technical Details |
|
|
|
|
|
### Stylometric Analysis |
|
|
The model incorporates comprehensive stylometric analysis including: |
|
|
- **Lexical Features:** Sentence length, word length, vocabulary richness |
|
|
- **Syntactic Patterns:** Definite articles, conjunctions, prepositions |
|
|
- **Author-Specific Vocabulary:** TF-IDF based characteristic words |
|
|
- **Style Classification:** Formality, complexity, emotional intensity |
|
|
|
|
|
### Prompt Engineering |
|
|
- **Format:** `"اكتب النص التالي بأسلوب <author:يوسف_إدريس>: [original_text]"` |
|
|
- **Author Tokens:** Descriptive tokens like `<author:يوسف_إدريس>` |
|
|
- **Target:** Generated text in author's style |
|
|
|
|
|
## 📚 Supported Authors |
|
|
|
|
|
<p align="center"> |
|
|
<img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F628f7a71dd993507cfcbe587%2FqDHUSa6ZvD1LjN9uJs-jp.png%26quot%3B%3C%2Fspan%3E width="600"/> |
|
|
</p> |
|
|
|
|
|
|
|
|
## 📁 Input File Format |
|
|
|
|
|
For batch processing, your input file should have the following format: |
|
|
|
|
|
## 📊 Example Snippets from the Dataset |
|
|
|
|
|
| id | text_in_msa (partial) | text_in_author_style (partial) | |
|
|
|----|------------------------|--------------------------------| |
|
|
| 3835 | "لم أقم مطلقًا بالاحتفال بعيد ميلادي... وكنت أتجادل مع كامل الشناوي..." | "عمري ما احتفلت بعيد ميلادي... وأتشاجر مع كامل الشناوي على ذلك الاكتئاب..." | |
|
|
| 3836 | "الزمن العام هو العداد الجماعي الذي يسجل السنين... ويبرز الزمن الخاص..." | "الزمن العام يعدّ السنين للناس كلها... أما عدادك الخاص فأنت نادرًا ما تنظر فيه..." | |
|
|
| 3837 | "مصر الغنية الراقية... اشتراكية وديمقراطية تتفاعل معًا... أحلام الخمسين..." | "مصر المصنِّعة... الكون مائة زهرة... وحين أبلغ الخمسين أبدأ أعيش وأتعلم الموسيقى..." | |
|
|
| 3838 | "غرابة التجربة... طفولة جادة تمامًا بلا مرح... الطفولة كانت عيبًا..." | "غريبة هي الأفكار... كنتُ رجلًا رهيبًا في ثوب طفل... والطفولة تُهمة نخشى الاعتراف بها..." | |
|
|
| 3839 | "هذا ليس ندمًا... موجة تفوقك قوة... النصر الحقيقي أن تعيش كما تختار..." | "ليس مرارة ولا ندمًا... أنت تناضل موجة أعتى منك... والحق أن تحيا كما اخترت أنت..." | |
|
|
|
|
|
|
|
|
## 📊 Performance Metrics |
|
|
|
|
|
- **BLEU Score:** 24.58 |
|
|
- **chrF Score:** 59.01 |
|
|
- **Competition:** First Place in AraGenEval 2024 |
|
|
- **Supported Authors:** 21 Arabic authors |
|
|
|
|
|
Official results on the AraGenEval 2025 testset. Our prompt engineering system ranked first. |
|
|
|
|
|
<p align="left"> |
|
|
<img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F628f7a71dd993507cfcbe587%2FpCfAK4zefvXZ4YI1AvXIG.png%26quot%3B%3C%2Fspan%3E width="400"/> |
|
|
</p> |
|
|
|
|
|
## 🚀 Quick Start: Style Transfer Example |
|
|
|
|
|
```python |
|
|
from transformers import T5Tokenizer, T5ForConditionalGeneration |
|
|
import torch |
|
|
|
|
|
# Load model |
|
|
model_name = "Omartificial-Intelligence-Space/AraStyleTransfer-21" |
|
|
|
|
|
tokenizer = T5Tokenizer.from_pretrained(model_name) |
|
|
model = T5ForConditionalGeneration.from_pretrained(model_name) |
|
|
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
model.to(device) |
|
|
|
|
|
# Input text and author |
|
|
text = "لم أقم مطلقًا بالاحتفال بعيد ميلادي منذ طفولتي." |
|
|
author = "يوسف إدريس" |
|
|
|
|
|
# Prompt format |
|
|
prompt = f"اكتب النص التالي بأسلوب <author:{author.replace(' ', '_')}>: {text}" |
|
|
|
|
|
# Tokenize |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(device) |
|
|
|
|
|
# Generate |
|
|
output_ids = model.generate( |
|
|
**inputs, |
|
|
max_length=256, |
|
|
num_beams=5, |
|
|
early_stopping=True |
|
|
) |
|
|
|
|
|
# Decode |
|
|
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True) |
|
|
|
|
|
print("Original:", text) |
|
|
print("Author:", author) |
|
|
print("Output:", generated_text) |
|
|
``` |
|
|
|
|
|
|
|
|
## 🎯 Use Cases |
|
|
|
|
|
- **Content Creation:** Generate text in specific author styles |
|
|
- **Educational Tools:** Demonstrate different writing styles |
|
|
- **Research:** Study Arabic literary styles and patterns |
|
|
- **Creative Writing:** Inspire new content in classic styles |
|
|
|
|
|
## 🤝 Contributing |
|
|
|
|
|
This model was developed for the [AraGenEval 2025](https://ezzini.github.io/AraGenEval/) competition. For questions or contributions, please refer to the competition guidelines. |
|
|
|
|
|
## 📄 License |
|
|
|
|
|
This model is released under the same license as the base AraT5v2 model. |
|
|
|
|
|
|
|
|
## BibTeX Citation |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{nacar2025anlpers, |
|
|
title={ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer}, |
|
|
author={Nacar, Omer and Reda, Mahmoud and Sibaee, Serry and Alhabashi, Yasser and Ammar, Adel and Boulila, Wadii}, |
|
|
booktitle={Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks}, |
|
|
pages={49--53}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
--- |
|
|
|
|
|
**🏆 First Place Winner at AraGenEval 2025 - Arabic Text Style Transfer Competition** |