anzorq/kbd_speech
Viewer • Updated • 20.6k • 136 • 2
How to use anzorq/w2v-bert-2.0-kbd with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="anzorq/w2v-bert-2.0-kbd") # Load model directly
from transformers import AutoProcessor, AutoModelForCTC
processor = AutoProcessor.from_pretrained("anzorq/w2v-bert-2.0-kbd")
model = AutoModelForCTC.from_pretrained("anzorq/w2v-bert-2.0-kbd")This is a fine-tuned model for Automatic Speech Recognition (ASR) in kbd, based on the facebook/w2v-bert-2.0 model.
The model was trained on a combination of the anzorq/kbd_speech (filtered on country=russia) and anzorq/sixuxar_yijiri_mak7 datasets.
The model was fine-tuned using the following training arguments:
TrainingArguments(
output_dir='output',
group_by_length=True,
per_device_train_batch_size=8,
gradient_accumulation_steps=2,
evaluation_strategy="steps",
num_train_epochs=10,
gradient_checkpointing=True,
fp16=True,
save_steps=1000,
eval_steps=500,
logging_steps=300,
learning_rate=5e-5,
warmup_steps=500,
save_total_limit=2,
push_to_hub=True,
report_to="wandb"
)
The model's performance during training:
| Step | Training Loss | Validation Loss | WER |
|---|---|---|---|
| 500 | 2.859600 | inf | 0.870362 |
| 1000 | 0.355500 | inf | 0.703617 |
| 1500 | 0.247100 | inf | 0.549942 |
| 2000 | 0.196700 | inf | 0.471762 |
| 2500 | 0.181500 | inf | 0.361494 |
| 3000 | 0.152200 | inf | 0.314119 |
| 3500 | 0.135700 | inf | 0.275146 |
| 4000 | 0.113400 | inf | 0.252625 |
| 4500 | 0.102900 | inf | 0.277013 |
| 5000 | 0.078500 | inf | 0.250175 |