license: cc-by-4.0
language:
- ja
pipeline_tag: feature-extraction
tags:
- streaming
- NeMo
- PyTorch
- Automatic Speech Recognition
- FastConformer
- CTC
- hybrid
datasets:
- mozilla-foundation/common_voice_23
model-index:
- name: Fast_Transducer-CTC_ctc-0.1b-ja
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: JSUT basic5000
type: japanese-asr/ja_asr.jsut_basic5000
split: test
args:
language: ja
metrics:
- name: Test CER
type: cer
value: 10.53
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Mozilla Common Voice 16.1
type: mozilla-foundation/common_voice_16_1
config: ja
split: test
args:
language: ja
metrics:
- name: Test CER
type: cer
value: 19
Streaming FastConformer-Hybrid Large (Ja)
This collection contains large size versions of cache-aware FastConformer-Hybrid (around 120M parameters) trained on a Japanse speech. These models are trained for streaming ASR with look-ahead of 1040ms which be used for very low-latency streaming applications. The model is hybrid with both Transducer and CTC decoders.
Model Architecture
These models are cache-aware versions of Hybrid FastConfomer which are trained for streaming ASR. You may find more info on cache-aware models here: Cache-aware Streaming Conformer . The models are trained with multiple look-aheads which makes the model to be able to support different latencies. To learn on how to switch between different look-ahead, you may read the documentation on the cache-aware models.
Datasets
The model in this collection is trained on two datasets comprising approxinately 20000 hours of Janpanese speech:
- Mozilla Common Voice Ja(v23.0)
- AsrSet_Ja
Performance
The following table summarizes the performance of this model in terms of Character Error Rate (CER%).
In CER calculation, punctuation marks and non-alphabet characters are removed, and numbers are transformed to words using num2words library.
| Version | Decoder | JSUT basic5000 | MCV16.1 test |
|---|---|---|---|
| 1.1.0 | CTC | 10.53 | 19.0 |