Parakeet-TDT 0.6B v3 (Converted for MAX)

This is nvidia/parakeet-tdt-0.6b-v3 converted from NeMo .nemo format to safetensors for use with Modular's MAX inference framework.

The model weights are numerically identical to the original. Only the format and naming conventions have changed. For full model details, benchmarks, training data, and evaluation results, see the original NVIDIA model card.

Model Summary

Original model: nvidia/parakeet-tdt-0.6b-v3
Architecture: FastConformer encoder (24 layers, 1024 hidden) + TDT decoder (2-layer LSTM, joint network)
Parameters: ~600M
Languages: 25 (bg, hr, cs, da, nl, en, et, fi, fr, de, el, hu, it, lv, lt, mt, pl, pt, ro, sk, sl, es, sv, ru, uk)
License: CC-BY-4.0 (same as original)
Technical report: arXiv:2509.14128

What's in This Repo

File	Description
`model.safetensors`	Encoder weights (NeMo names remapped, Conv2d permuted FCRS→RSCF)
`decoder_joint.npz`	LSTM prediction network + joint network (numpy)
`config.json`	HF-style config with `normalize_features`, encoder/decoder/joint settings
`spiece.model`	SentencePiece BPE tokenizer (8192 tokens)
`tokenizer_config.json`	HF tokenizer config

Usage with MAX

Note: Parakeet architecture support is currently a prototype on a development branch and is not yet merged into the official modular/modular repository. PRs are in progress of being made.

Serve via API

max serve --model-path pherber3/parakeet-tdt-0.6b-v3 --devices cpu

Transcribe audio via the OpenAI-compatible endpoint:

curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F "model=pherber3/parakeet-tdt-0.6b-v3"

{"text": "mister Quilter is the apostle of the middle classes, and we are glad to welcome his gospel."}

Requirements

MAX framework with Parakeet architecture support (dev branch)
Audio input: 16kHz mono WAV

Conversion

Converted from the original NeMo .nemo archive using convert_nemo.py:

python convert_nemo.py nvidia/parakeet-tdt-0.6b-v3 -o ./converted-tdt

Citation

@article{nvidia_parakeet_tdt_v3,
  title={Granary: Speech Recognition and Translation Dataset in 25 European Languages},
  author={NVIDIA},
  year={2025},
  url={https://arxiv.org/abs/2509.14128}
}

Acknowledgments

NVIDIA for the original Parakeet-TDT model and Granary dataset
Modular for the MAX inference framework

Downloads last month: 41

Model tree for pherber3/parakeet-tdt-0.6b-v3

Base model

nvidia/parakeet-tdt-0.6b-v3

Finetuned

(22)

this model

Paper for pherber3/parakeet-tdt-0.6b-v3

Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST

Paper • 2509.14128 • Published Sep 17, 2025 • 2