YANGGunn's picture
Update README.md
ca90018 verified
metadata
license: cc-by-4.0
language:
  - ja
pipeline_tag: feature-extraction
tags:
  - streaming
  - NeMo
  - PyTorch
  - Automatic Speech Recognition
  - FastConformer
  - CTC
  - hybrid
datasets:
  - mozilla-foundation/common_voice_23
model-index:
  - name: Fast_Transducer-CTC_ctc-0.1b-ja
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: JSUT basic5000
          type: japanese-asr/ja_asr.jsut_basic5000
          split: test
          args:
            language: ja
        metrics:
          - name: Test CER
            type: cer
            value: 10.53
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Mozilla Common Voice 16.1
          type: mozilla-foundation/common_voice_16_1
          config: ja
          split: test
          args:
            language: ja
        metrics:
          - name: Test CER
            type: cer
            value: 19

Streaming FastConformer-Hybrid Large (Ja)

This collection contains large size versions of cache-aware FastConformer-Hybrid (around 120M parameters) trained on a Japanse speech. These models are trained for streaming ASR with look-ahead of 1040ms which be used for very low-latency streaming applications. The model is hybrid with both Transducer and CTC decoders.

Model Architecture

These models are cache-aware versions of Hybrid FastConfomer which are trained for streaming ASR. You may find more info on cache-aware models here: Cache-aware Streaming Conformer . The models are trained with multiple look-aheads which makes the model to be able to support different latencies. To learn on how to switch between different look-ahead, you may read the documentation on the cache-aware models.

Datasets

The model in this collection is trained on two datasets comprising approxinately 20000 hours of Janpanese speech:

  • Mozilla Common Voice Ja(v23.0)
  • AsrSet_Ja

Performance

The following table summarizes the performance of this model in terms of Character Error Rate (CER%).

In CER calculation, punctuation marks and non-alphabet characters are removed, and numbers are transformed to words using num2words library.

Version Decoder JSUT basic5000 MCV16.1 test
1.1.0 CTC 10.53 19.0