AraStyleTransfer-21 | 21 Arabic Author Styles. One Model.

๐Ÿ† First Place Winner at AraGenEval 2025 Competition

A state-of-the-art Arabic text style transfer model that transforms text into the writing style of 21 different Arabic authors using descriptive author tokens and prompt engineering.

๐Ÿ”— Paper Link (ACL Anthology)

๐Ÿ“˜ ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer [https://aclanthology.org/2025.arabicnlp-sharedtasks.8.pdf]

๐Ÿ—๏ธ Model Architecture

  • Base Model: UBC-NLP/AraT5v2-base-1024
  • Approach: Descriptive Author Tokens + Prompt Engineering
  • Input Format: "ุงูƒุชุจ ุงู„ู†ุต ุงู„ุชุงู„ูŠ ุจุฃุณู„ูˆุจ <author:name>: [text]"
  • Training: Fine-tuned with author-specific tokens

๐Ÿ”ฌ Technical Details

Stylometric Analysis

The model incorporates comprehensive stylometric analysis including:

  • Lexical Features: Sentence length, word length, vocabulary richness
  • Syntactic Patterns: Definite articles, conjunctions, prepositions
  • Author-Specific Vocabulary: TF-IDF based characteristic words
  • Style Classification: Formality, complexity, emotional intensity

Prompt Engineering

  • Format: "ุงูƒุชุจ ุงู„ู†ุต ุงู„ุชุงู„ูŠ ุจุฃุณู„ูˆุจ <author:ูŠูˆุณู_ุฅุฏุฑูŠุณ>: [original_text]"
  • Author Tokens: Descriptive tokens like <author:ูŠูˆุณู_ุฅุฏุฑูŠุณ>
  • Target: Generated text in author's style

๐Ÿ“š Supported Authors

๐Ÿ“ Input File Format

For batch processing, your input file should have the following format:

๐Ÿ“Š Example Snippets from the Dataset

id text_in_msa (partial) text_in_author_style (partial)
3835 "ู„ู… ุฃู‚ู… ู…ุทู„ู‚ู‹ุง ุจุงู„ุงุญุชูุงู„ ุจุนูŠุฏ ู…ูŠู„ุงุฏูŠ... ูˆูƒู†ุช ุฃุชุฌุงุฏู„ ู…ุน ูƒุงู…ู„ ุงู„ุดู†ุงูˆูŠ..." "ุนู…ุฑูŠ ู…ุง ุงุญุชูู„ุช ุจุนูŠุฏ ู…ูŠู„ุงุฏูŠ... ูˆุฃุชุดุงุฌุฑ ู…ุน ูƒุงู…ู„ ุงู„ุดู†ุงูˆูŠ ุนู„ู‰ ุฐู„ูƒ ุงู„ุงูƒุชุฆุงุจ..."
3836 "ุงู„ุฒู…ู† ุงู„ุนุงู… ู‡ูˆ ุงู„ุนุฏุงุฏ ุงู„ุฌู…ุงุนูŠ ุงู„ุฐูŠ ูŠุณุฌู„ ุงู„ุณู†ูŠู†... ูˆูŠุจุฑุฒ ุงู„ุฒู…ู† ุงู„ุฎุงุต..." "ุงู„ุฒู…ู† ุงู„ุนุงู… ูŠุนุฏู‘ ุงู„ุณู†ูŠู† ู„ู„ู†ุงุณ ูƒู„ู‡ุง... ุฃู…ุง ุนุฏุงุฏูƒ ุงู„ุฎุงุต ูุฃู†ุช ู†ุงุฏุฑู‹ุง ู…ุง ุชู†ุธุฑ ููŠู‡..."
3837 "ู…ุตุฑ ุงู„ุบู†ูŠุฉ ุงู„ุฑุงู‚ูŠุฉ... ุงุดุชุฑุงูƒูŠุฉ ูˆุฏูŠู…ู‚ุฑุงุทูŠุฉ ุชุชูุงุนู„ ู…ุนู‹ุง... ุฃุญู„ุงู… ุงู„ุฎู…ุณูŠู†..." "ู…ุตุฑ ุงู„ู…ุตู†ู‘ูุนุฉ... ุงู„ูƒูˆู† ู…ุงุฆุฉ ุฒู‡ุฑุฉ... ูˆุญูŠู† ุฃุจู„ุบ ุงู„ุฎู…ุณูŠู† ุฃุจุฏุฃ ุฃุนูŠุด ูˆุฃุชุนู„ู… ุงู„ู…ูˆุณูŠู‚ู‰..."
3838 "ุบุฑุงุจุฉ ุงู„ุชุฌุฑุจุฉ... ุทููˆู„ุฉ ุฌุงุฏุฉ ุชู…ุงู…ู‹ุง ุจู„ุง ู…ุฑุญ... ุงู„ุทููˆู„ุฉ ูƒุงู†ุช ุนูŠุจู‹ุง..." "ุบุฑูŠุจุฉ ู‡ูŠ ุงู„ุฃููƒุงุฑ... ูƒู†ุชู ุฑุฌู„ู‹ุง ุฑู‡ูŠุจู‹ุง ููŠ ุซูˆุจ ุทูู„... ูˆุงู„ุทููˆู„ุฉ ุชูู‡ู…ุฉ ู†ุฎุดู‰ ุงู„ุงุนุชุฑุงู ุจู‡ุง..."
3839 "ู‡ุฐุง ู„ูŠุณ ู†ุฏู…ู‹ุง... ู…ูˆุฌุฉ ุชููˆู‚ูƒ ู‚ูˆุฉ... ุงู„ู†ุตุฑ ุงู„ุญู‚ูŠู‚ูŠ ุฃู† ุชุนูŠุด ูƒู…ุง ุชุฎุชุงุฑ..." "ู„ูŠุณ ู…ุฑุงุฑุฉ ูˆู„ุง ู†ุฏู…ู‹ุง... ุฃู†ุช ุชู†ุงุถู„ ู…ูˆุฌุฉ ุฃุนุชู‰ ู…ู†ูƒ... ูˆุงู„ุญู‚ ุฃู† ุชุญูŠุง ูƒู…ุง ุงุฎุชุฑุช ุฃู†ุช..."

๐Ÿ“Š Performance Metrics

  • BLEU Score: 24.58
  • chrF Score: 59.01
  • Competition: First Place in AraGenEval 2024
  • Supported Authors: 21 Arabic authors

Official results on the AraGenEval 2025 testset. Our prompt engineering system ranked first.

๐Ÿš€ Quick Start: Style Transfer Example

from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

# Load model
model_name = "Omartificial-Intelligence-Space/AraStyleTransfer-21"

tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Input text and author
text = "ู„ู… ุฃู‚ู… ู…ุทู„ู‚ู‹ุง ุจุงู„ุงุญุชูุงู„ ุจุนูŠุฏ ู…ูŠู„ุงุฏูŠ ู…ู†ุฐ ุทููˆู„ุชูŠ."
author = "ูŠูˆุณู ุฅุฏุฑูŠุณ"

# Prompt format
prompt = f"ุงูƒุชุจ ุงู„ู†ุต ุงู„ุชุงู„ูŠ ุจุฃุณู„ูˆุจ <author:{author.replace(' ', '_')}>: {text}"

# Tokenize
inputs = tokenizer(prompt, return_tensors="pt").to(device)

# Generate
output_ids = model.generate(
    **inputs,
    max_length=256,
    num_beams=5,
    early_stopping=True
)

# Decode
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print("Original:", text)
print("Author:", author)
print("Output:", generated_text)

๐ŸŽฏ Use Cases

  • Content Creation: Generate text in specific author styles
  • Educational Tools: Demonstrate different writing styles
  • Research: Study Arabic literary styles and patterns
  • Creative Writing: Inspire new content in classic styles

๐Ÿค Contributing

This model was developed for the AraGenEval 2025 competition. For questions or contributions, please refer to the competition guidelines.

๐Ÿ“„ License

This model is released under the same license as the base AraT5v2 model.

BibTeX Citation

@inproceedings{nacar2025anlpers,
  title={ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer},
  author={Nacar, Omer and Reda, Mahmoud and Sibaee, Serry and Alhabashi, Yasser and Ammar, Adel and Boulila, Wadii},
  booktitle={Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks},
  pages={49--53},
  year={2025}
}

๐Ÿ† First Place Winner at AraGenEval 2025 - Arabic Text Style Transfer Competition

Downloads last month
5
Safetensors
Model size
0.4B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Omartificial-Intelligence-Space/AraStyleTransfer-21

Finetuned
(22)
this model