---
{}
---
language:
- vi
license: mit
library_name: transformers
tags:
- summarization
- vietnamese
- bartpho
- nlp
- generated_from_trainer
base_model: vinai/bartpho-syllable
datasets:
- phamtheds/news-dataset-vietnameses
metrics:
- rouge
pipeline_tag: summarization
model-index:
- name: Bartpho Vietnamese Summarization
  results: []
---

# Model Card for Bartpho Vietnamese Summarization

This model is a fine-tuned version of **vinai/bartpho-syllable** on the **phamtheds/news-dataset-vietnameses** dataset. It is designed to generate abstractive summaries for Vietnamese news articles.

## Model Details

### Model Description

* **Model type:** Transformer-based Sequence-to-Sequence model (BART architecture)
* **Language(s) (NLP):** Vietnamese
* **License:** MIT
* **Finetuned from model:** vinai/bartpho-syllable

### Model Sources

* **Repository:** [Link to your Hugging Face Repo]
* **Base Model Paper:** [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2009.12277)

## Uses

### Direct Use

The model can be used for summarizing Vietnamese texts, specifically news articles. It takes a full article text as input and outputs a concise summary.

### Out-of-Scope Use

* The model may not perform well on non-standard Vietnamese (teencode), conversational text, or extremely technical documents (legal/medical) without further fine-tuning.
* It is not designed to generate factual content from scratch, but rather to condense provided information.

## Bias, Risks, and Limitations

* **Hallucination:** Like all sequence-to-seq models, there is a risk of generating information that is not present in the source text.
* **Data Bias:** The model reflects the biases present in the training data (mainstream Vietnamese news sources).

## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import pipeline

summarizer = pipeline("summarization", model="your-username/bartpho-vietnamese-summarization")

article = """
[Insert your long Vietnamese news article here]
"""

summary = summarizer(article, max_length=128, min_length=30, length_penalty=2.0, num_beams=4)
print(summary[0]['summary_text'])
````

## Training Details

### Training Data

The model was trained on the **phamtheds/news-dataset-vietnameses**, which contains Vietnamese news articles and their corresponding summaries.

### Training Procedure

The model was fine-tuned using the Hugging Face `Trainer` API on a T4 GPU.

#### Training Hyperparameters

  * **Learning Rate:** 2e-5
  * **Batch Size:** 4
  * **Gradient Accumulation Steps:** 2
  * **Epochs:** 3
  * **Weight Decay:** 0.01
  * **Optimizer:** AdamW
  * **Precision:** fp16 (mixed precision)
  * **Max Input Length:** 1024 tokens
  * **Max Output Length:** 256 tokens

## Evaluation

### Metrics

The model was evaluated using the **ROUGE** metric (ROUGE-1, ROUGE-2, ROUGE-L).

## Citation

If you use this model, please cite the original BARTpho paper:

```bibtex
@inproceedings{tran2020bartpho,
  title={BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese},
  author={Tran, Nguyen Luong and Phan, Duong Minh and Nguyen, Dat Quoc},
  booktitle={Interspeech},
  year={2020}
}
```

```

***

### How to apply this:

1.  Open your repository on Hugging Face.
2.  Click on the **README.md** file.
3.  Click the **Edit** button.
4.  **Delete everything** currently in the file.
5.  **Paste** the block above.
6.  **Important:** Change `your-username/bartpho-vietnamese-summarization` to your actual username and repo name.
7.  Click **Commit changes**.

This will render a clean, professional page with the correct metadata tags on the right sidebar (Dataset links, Language tags, License, etc.).
```