Parakeet-TDT 0.6B v3 (Converted for MAX)

This is nvidia/parakeet-tdt-0.6b-v3 converted from NeMo .nemo format to safetensors for use with Modular's MAX inference framework.

The model weights are numerically identical to the original. Only the format and naming conventions have changed. For full model details, benchmarks, training data, and evaluation results, see the original NVIDIA model card.

Model Summary

  • Original model: nvidia/parakeet-tdt-0.6b-v3
  • Architecture: FastConformer encoder (24 layers, 1024 hidden) + TDT decoder (2-layer LSTM, joint network)
  • Parameters: ~600M
  • Languages: 25 (bg, hr, cs, da, nl, en, et, fi, fr, de, el, hu, it, lv, lt, mt, pl, pt, ro, sk, sl, es, sv, ru, uk)
  • License: CC-BY-4.0 (same as original)
  • Technical report: arXiv:2509.14128

What's in This Repo

File Description
model.safetensors Encoder weights (NeMo names remapped, Conv2d permuted FCRS→RSCF)
decoder_joint.npz LSTM prediction network + joint network (numpy)
config.json HF-style config with normalize_features, encoder/decoder/joint settings
spiece.model SentencePiece BPE tokenizer (8192 tokens)
tokenizer_config.json HF tokenizer config

Usage with MAX

Note: Parakeet architecture support is currently a prototype on a development branch and is not yet merged into the official modular/modular repository. PRs are in progress of being made.

Serve via API

max serve --model-path pherber3/parakeet-tdt-0.6b-v3 --devices cpu

Transcribe audio via the OpenAI-compatible endpoint:

curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F "model=pherber3/parakeet-tdt-0.6b-v3"
{"text": "mister Quilter is the apostle of the middle classes, and we are glad to welcome his gospel."}

Requirements

  • MAX framework with Parakeet architecture support (dev branch)
  • Audio input: 16kHz mono WAV

Conversion

Converted from the original NeMo .nemo archive using convert_nemo.py:

python convert_nemo.py nvidia/parakeet-tdt-0.6b-v3 -o ./converted-tdt

Citation

@article{nvidia_parakeet_tdt_v3,
  title={Granary: Speech Recognition and Translation Dataset in 25 European Languages},
  author={NVIDIA},
  year={2025},
  url={https://arxiv.org/abs/2509.14128}
}

Acknowledgments

Downloads last month
41
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pherber3/parakeet-tdt-0.6b-v3

Finetuned
(22)
this model

Paper for pherber3/parakeet-tdt-0.6b-v3