|
|
--- |
|
|
license: cc-by-4.0 |
|
|
language: |
|
|
- en |
|
|
- it |
|
|
- pt |
|
|
- de |
|
|
- fr |
|
|
- es |
|
|
- ja |
|
|
- zh |
|
|
tags: |
|
|
- automatic-speech-recognition |
|
|
- speech |
|
|
- audio |
|
|
- Transformer |
|
|
- flow-matching |
|
|
- discrete-flow-matching |
|
|
- pytorch |
|
|
- hf-asr-leaderboard |
|
|
--- |
|
|
|
|
|
# Drax: Speech Recognition with Discrete Flow Matching |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
The Drax model family provides speech recognition models based on discrete flow matching. |
|
|
The `drax-v1` model supports eight languages: English, Spanish, French, Portuguese, German, Italian, Japanese and Chinese. |
|
|
It is an encoder-decoder model consists of a Whisper-large-v3 encoder, and a DiT based decoder, with a total of ~1.2B parameters. |
|
|
|
|
|
More details on usage in our GitHub repo, [https://github.com/aiola-lab/drax](https://github.com/aiola-lab/drax) and our [paper](https://arxiv.org/abs/2510.04162). |
|
|
|
|
|
## Usage |
|
|
|
|
|
See [https://github.com/aiola-lab/drax](https://github.com/aiola-lab/drax) for installation instructions. |
|
|
|
|
|
```python |
|
|
from drax import Transcriber |
|
|
|
|
|
asr = Transcriber(model_path="aiola/drax-v1") |
|
|
result = asr.transcribe("/path/to/audio.wav", language="en") |
|
|
print(result[0].transcript) |
|
|
``` |
|
|
|
|
|
Control sampling steps, temperature etc. |
|
|
|
|
|
```python |
|
|
from drax import Transcriber |
|
|
|
|
|
asr = Transcriber(model_path="aiola/drax-v1") |
|
|
result = asr.transcribe("/path/to/audio.wav", language="en", sampling_steps=32, temperature=1e-2) |
|
|
print(result[0].transcript) |
|
|
``` |
|
|
|
|
|
Batch inference: |
|
|
|
|
|
```python |
|
|
from drax import Transcriber |
|
|
|
|
|
asr = Transcriber(model_path="aiola/drax-v1") |
|
|
audio_paths = ["/path/to/audio1.wav", "/path/to/audio2.wav"] |
|
|
languages = ["en", "de"] |
|
|
result = asr.transcribe(audio_paths, language=languages) |
|
|
print(result.transcript) |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{navon2025drax, |
|
|
title={Drax: Speech Recognition with Discrete Flow Matching}, |
|
|
author={Navon, Aviv and Shamsian, Aviv and Glazer, Neta and Segal-Feldman, Yael and Hetz, Gill and Keshet, Joseph and Fetaya, Ethan}, |
|
|
journal={arXiv preprint arXiv:2510.04162}, |
|
|
year={2025} |
|
|
} |
|
|
``` |