t5-gqa-moe-xsum_with_lb
Model Description
T5 with Grouped Query Attention (GQA) + MoE fine-tuned on the XSUM dataset for abstractive summarization.
Architecture
This model combines two advanced techniques:
- Grouped Query Attention (GQA): Custom attention mechanism that reduces KV cache size
- Sparse Mixture of Experts (MoE): Multiple expert networks with learned routing
Key Features:
- Custom GQA implementation (not library-based)
- Token Choice Top-k routing with load balancing
- Reduced inference memory footprint
Model Configuration
- Base Model: google-t5/t5-small
- Total Parameters: 229,656,576
- Trainable Parameters: 229,656,576
- Number of Experts: 8
- Top-k: 2
- Load Balancing: Enabled
- GQA Query Heads: 8
- GQA KV Heads: 2
Training Data
The model was trained on the XSUM dataset, which contains:
- ~204k training examples
- ~11k validation examples
- ~11k test examples
Each example consists of a BBC news article and a one-sentence summary.
Usage
from transformers import T5Tokenizer
# Load tokenizer
tokenizer = T5Tokenizer.from_pretrained("YOUR_USERNAME/t5-gqa-moe-xsum_with_lb")
# Note: For MoE models, you need to reconstruct the architecture
# See the model repository for detailed loading instructions
Evaluation
Evaluate using standard ROUGE metrics and SummaC consistency scores.
Training Procedure
The model was trained using:
- AdamW optimizer with weight decay
- Learning rate: 5e-5
- Warmup steps: 500
- Mixed precision (FP16) training
- Gradient accumulation for larger effective batch size
Limitations
- Trained only on English news articles
- May not generalize well to other domains
- MoE models require custom loading code
Citation
If you use this model, please cite the XSUM dataset:
@inproceedings{narayan-etal-2018-dont,
title = "Don{'}t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization",
author = "Narayan, Shashi and Cohen, Shay B. and Lapata, Mirella",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
year = "2018",
}
- Downloads last month
- 4