|
|
--- |
|
|
{} |
|
|
--- |
|
|
language: |
|
|
- vi |
|
|
license: mit |
|
|
library_name: transformers |
|
|
tags: |
|
|
- summarization |
|
|
- vietnamese |
|
|
- bartpho |
|
|
- nlp |
|
|
- generated_from_trainer |
|
|
base_model: vinai/bartpho-syllable |
|
|
datasets: |
|
|
- phamtheds/news-dataset-vietnameses |
|
|
metrics: |
|
|
- rouge |
|
|
pipeline_tag: summarization |
|
|
model-index: |
|
|
- name: Bartpho Vietnamese Summarization |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
# Model Card for Bartpho Vietnamese Summarization |
|
|
|
|
|
This model is a fine-tuned version of **vinai/bartpho-syllable** on the **phamtheds/news-dataset-vietnameses** dataset. It is designed to generate abstractive summaries for Vietnamese news articles. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
* **Model type:** Transformer-based Sequence-to-Sequence model (BART architecture) |
|
|
* **Language(s) (NLP):** Vietnamese |
|
|
* **License:** MIT |
|
|
* **Finetuned from model:** vinai/bartpho-syllable |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
* **Repository:** [Link to your Hugging Face Repo] |
|
|
* **Base Model Paper:** [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2009.12277) |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
The model can be used for summarizing Vietnamese texts, specifically news articles. It takes a full article text as input and outputs a concise summary. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
* The model may not perform well on non-standard Vietnamese (teencode), conversational text, or extremely technical documents (legal/medical) without further fine-tuning. |
|
|
* It is not designed to generate factual content from scratch, but rather to condense provided information. |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
* **Hallucination:** Like all sequence-to-seq models, there is a risk of generating information that is not present in the source text. |
|
|
* **Data Bias:** The model reflects the biases present in the training data (mainstream Vietnamese news sources). |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
Use the code below to get started with the model. |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
summarizer = pipeline("summarization", model="your-username/bartpho-vietnamese-summarization") |
|
|
|
|
|
article = """ |
|
|
[Insert your long Vietnamese news article here] |
|
|
""" |
|
|
|
|
|
summary = summarizer(article, max_length=128, min_length=30, length_penalty=2.0, num_beams=4) |
|
|
print(summary[0]['summary_text']) |
|
|
```` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model was trained on the **phamtheds/news-dataset-vietnameses**, which contains Vietnamese news articles and their corresponding summaries. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
The model was fine-tuned using the Hugging Face `Trainer` API on a T4 GPU. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
* **Learning Rate:** 2e-5 |
|
|
* **Batch Size:** 4 |
|
|
* **Gradient Accumulation Steps:** 2 |
|
|
* **Epochs:** 3 |
|
|
* **Weight Decay:** 0.01 |
|
|
* **Optimizer:** AdamW |
|
|
* **Precision:** fp16 (mixed precision) |
|
|
* **Max Input Length:** 1024 tokens |
|
|
* **Max Output Length:** 256 tokens |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Metrics |
|
|
|
|
|
The model was evaluated using the **ROUGE** metric (ROUGE-1, ROUGE-2, ROUGE-L). |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original BARTpho paper: |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{tran2020bartpho, |
|
|
title={BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese}, |
|
|
author={Tran, Nguyen Luong and Phan, Duong Minh and Nguyen, Dat Quoc}, |
|
|
booktitle={Interspeech}, |
|
|
year={2020} |
|
|
} |
|
|
``` |
|
|
|
|
|
``` |
|
|
|
|
|
*** |
|
|
|
|
|
### How to apply this: |
|
|
|
|
|
1. Open your repository on Hugging Face. |
|
|
2. Click on the **README.md** file. |
|
|
3. Click the **Edit** button. |
|
|
4. **Delete everything** currently in the file. |
|
|
5. **Paste** the block above. |
|
|
6. **Important:** Change `your-username/bartpho-vietnamese-summarization` to your actual username and repo name. |
|
|
7. Click **Commit changes**. |
|
|
|
|
|
This will render a clean, professional page with the correct metadata tags on the right sidebar (Dataset links, Language tags, License, etc.). |
|
|
``` |