Japanese wav2vec 2.0 Base

This is a mirror of japanese-wav2vec2-base, originally released by rinna Co., Ltd. The original model is licensed under the Apache License 2.0. This mirror follows the same license terms. All copyrights remain with the original authors.

Overview

This is a Japanese wav2vec 2.0 Base model trained by rinna Co., Ltd.

Model summary

The model architecture is the same as the original wav2vec 2.0 Base model, which contains 12 transformer layers with 12 attention heads. The model was trained using code from the official repository, and the detailed training configuration can be found in the same repository and the original paper.
Training

The model was trained on approximately 19,000 hours of following Japanese speech corpus ReazonSpeech v1.
- ReazonSpeech
Contributors
Release date

March 7, 2024

How to use the model

import soundfile as sf
from transformers import AutoFeatureExtractor, AutoModel

model_name = "yky-h/japanese-wav2vec2-base"
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
model.eval()

raw_speech_16kHz, sr = sf.read(audio_file)
inputs = feature_extractor(
    raw_speech_16kHz,
    return_tensors="pt",
    sampling_rate=sr,
)
outputs = model(**inputs)

print(f"Input:  {inputs.input_values.size()}")  # [1, #samples]
print(f"Output: {outputs.last_hidden_state.size()}")  # [1, #frames, 768]

A fairseq checkpoint file can also be available here.

How to cite

@misc{rinna-japanese-wav2vec2-base,
    title = {rinna/japanese-wav2vec2-base},
    author = {Hono, Yukiya and Mitsui, Kentaro and Sawada, Kei},
    url = {https://huggingface.co/rinna/japanese-wav2vec2-base}
}

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}

References

@inproceedings{baevski2020wav2vec,
    title = {wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations},
    author = {Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael},
    booktitle = {Advances in Neural Information Processing Systems},
    year = {2020},
    volume = {33},
    pages = {12449--12460},
    url = {https://proceedings.neurips.cc/paper/2020/hash/92d1e1eb1cd6f9fba3227870bb6d7f07-Abstract.html}
}

License

The Apache 2.0 license

Downloads last month: 21

Safetensors

Model size

95M params

Tensor type

F32

Dataset used to train yky-h/japanese-wav2vec2-base

Paper for yky-h/japanese-wav2vec2-base

Release of Pre-Trained Models for the Japanese Language

Paper • 2404.01657 • Published Apr 2, 2024 • 1