--- {} --- language: - vi license: mit library_name: transformers tags: - summarization - vietnamese - bartpho - nlp - generated_from_trainer base_model: vinai/bartpho-syllable datasets: - phamtheds/news-dataset-vietnameses metrics: - rouge pipeline_tag: summarization model-index: - name: Bartpho Vietnamese Summarization results: [] --- # Model Card for Bartpho Vietnamese Summarization This model is a fine-tuned version of **vinai/bartpho-syllable** on the **phamtheds/news-dataset-vietnameses** dataset. It is designed to generate abstractive summaries for Vietnamese news articles. ## Model Details ### Model Description * **Model type:** Transformer-based Sequence-to-Sequence model (BART architecture) * **Language(s) (NLP):** Vietnamese * **License:** MIT * **Finetuned from model:** vinai/bartpho-syllable ### Model Sources * **Repository:** [Link to your Hugging Face Repo] * **Base Model Paper:** [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2009.12277) ## Uses ### Direct Use The model can be used for summarizing Vietnamese texts, specifically news articles. It takes a full article text as input and outputs a concise summary. ### Out-of-Scope Use * The model may not perform well on non-standard Vietnamese (teencode), conversational text, or extremely technical documents (legal/medical) without further fine-tuning. * It is not designed to generate factual content from scratch, but rather to condense provided information. ## Bias, Risks, and Limitations * **Hallucination:** Like all sequence-to-seq models, there is a risk of generating information that is not present in the source text. * **Data Bias:** The model reflects the biases present in the training data (mainstream Vietnamese news sources). ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import pipeline summarizer = pipeline("summarization", model="your-username/bartpho-vietnamese-summarization") article = """ [Insert your long Vietnamese news article here] """ summary = summarizer(article, max_length=128, min_length=30, length_penalty=2.0, num_beams=4) print(summary[0]['summary_text']) ```` ## Training Details ### Training Data The model was trained on the **phamtheds/news-dataset-vietnameses**, which contains Vietnamese news articles and their corresponding summaries. ### Training Procedure The model was fine-tuned using the Hugging Face `Trainer` API on a T4 GPU. #### Training Hyperparameters * **Learning Rate:** 2e-5 * **Batch Size:** 4 * **Gradient Accumulation Steps:** 2 * **Epochs:** 3 * **Weight Decay:** 0.01 * **Optimizer:** AdamW * **Precision:** fp16 (mixed precision) * **Max Input Length:** 1024 tokens * **Max Output Length:** 256 tokens ## Evaluation ### Metrics The model was evaluated using the **ROUGE** metric (ROUGE-1, ROUGE-2, ROUGE-L). ## Citation If you use this model, please cite the original BARTpho paper: ```bibtex @inproceedings{tran2020bartpho, title={BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese}, author={Tran, Nguyen Luong and Phan, Duong Minh and Nguyen, Dat Quoc}, booktitle={Interspeech}, year={2020} } ``` ``` *** ### How to apply this: 1. Open your repository on Hugging Face. 2. Click on the **README.md** file. 3. Click the **Edit** button. 4. **Delete everything** currently in the file. 5. **Paste** the block above. 6. **Important:** Change `your-username/bartpho-vietnamese-summarization` to your actual username and repo name. 7. Click **Commit changes**. This will render a clean, professional page with the correct metadata tags on the right sidebar (Dataset links, Language tags, License, etc.). ```