RoBERTa MLM Pretrained Model
This model was pretrained using Masked Language Modeling (MLM) objective on multilingual text data.
Model Description
This is a RoBERTa-based transformer model pretrained from scratch (or fine-tuned) using the Masked Language Modeling objective. The model learns to predict masked tokens in input sequences, developing a strong understanding of language patterns and semantics.
Model Architecture:
- Hidden Layers: 6
- Hidden Dimensions: 512
- Attention Heads: 8
- Maximum Sequence Length: 640
Training Details
Training Data
- Dataset: dstilesr/glotlid-balanced-train
- Version: 2025.09.101615
Training Hyperparameters
- Epochs: 2
- Batch Size: 112
- Learning Rate: 0.00014
- Optimizer: adamw_torch_fused
- MLM Probability: 0.2
- Weight Decay: 0.0
- Warmup Steps: 128
- Gradient Accumulation Steps: 2
Framework
- Library: Transformers (Hugging Face)
- Training Framework: PyTorch
Usage
Masked Language Modeling
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("dstilesr/glotlid-pretrained-roberta")
model = AutoModelForMaskedLM.from_pretrained("dstilesr/glotlid-pretrained-roberta")
# Example: Fill mask
text = "The capital of France is <mask>."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
Fine-tuning for Sequence Classification
This pretrained model can be fine-tuned for downstream tasks like sequence classification:
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(
"dstilesr/glotlid-pretrained-roberta",
num_labels=num_classes,
ignore_mismatched_sizes=True
)
Intended Use
This model is designed as a pretrained base model for various NLP tasks:
- Fine-tuning for text classification
- Fine-tuning for sequence labeling
- Feature extraction for downstream tasks
- Transfer learning for low-resource languages
Limitations
- Maximum input length is 640 tokens
- Performance depends on similarity between pretraining and downstream data
- May require task-specific fine-tuning for optimal performance
- Downloads last month
- 2