Magpie TTS Multilingual 357M (CoreML)
CoreML export of NVIDIA's Magpie TTS Multilingual 357M, optimized for on-device inference on Apple Silicon. Ships both .mlmodelc (compiled, ready-to-run) and .mlpackage (portable source) for macOS 14+ / iOS 17+.
Converted with FluidInference/mobius. Consumed by the Swift port in FluidInference/FluidAudio (see Sources/FluidAudio/TTS/Magpie/).
Languages
English, Spanish, German, French, Italian, Vietnamese, Mandarin, Hindi. Japanese is not yet included. French / Italian / Vietnamese use ByT5 byte-level tokenization (algorithmic, no lookup files).
Contents
βββ manifest.json # machine-readable index (sha256, shapes, IO specs)
βββ text_encoder.{mlmodelc,mlpackage} # text β (1, 256, 768) encoder output
βββ decoder_step.{mlmodelc,mlpackage} # 12-layer AR decoder (stateful KV cache)
βββ decoder_prefill.{mlmodelc,mlpackage}# batched prefill fast path
βββ nanocodec_decoder.{mlmodelc,mlpackage} # 8-codebook β PCM vocoder, 22050 Hz, max 256 frames
βββ constants/
β βββ constants.json # d_model, n_layers, special token ids, sampling defaults
β βββ speaker_info.json # 5 speakers, context shape (110, 768)
β βββ tokenizer_{info,metadata,references}.json
β βββ speaker_0.npy .. speaker_4.npy
β βββ speaker_embeddings_raw.npy
β βββ text_embedding.npy
β βββ audio_embedding_0.npy .. audio_embedding_7.npy # per-codebook (2024, 768)
β βββ local_transformer/ # 1-layer transformer weights (Swift reads .npy)
βββ tokenizer/
βββ english_phoneme_*.json # phoneme_dict + token2id + heteronyms
βββ spanish_phoneme_*.json # phoneme_dict + token2id
βββ german_phoneme_*.json # phoneme_dict + token2id + heteronyms
βββ mandarin_phoneme_*.json # token2id, phoneme_dict, pinyin/tone/ascii dicts
βββ mandarin_jieba_dict.json
βββ mandarin_pypinyin_{char,phrase}_dict.json
βββ hindi_chartokenizer_token2id.json
Prefer .mlmodelc at runtime; .mlpackage is included for inspection and re-compilation. Consult manifest.json for the full asset table (file sizes, sha256, model IO shapes per layer).
Usage (Swift)
import FluidAudio
let manager = try await MagpieTtsManager.downloadAndCreate(
languages: [.english, .spanish]
)
let result = try await manager.synthesize(
text: "Hello | Λ n Ι m o Κ | from FluidAudio.",
speaker: .john,
language: .english
)
let wav = AudioWAV.data(from: result.samples, sampleRate: result.sampleRate)
try wav.write(to: URL(fileURLWithPath: "hello.wav"))
The manager lazy-downloads everything in this repo on first use.
Inline IPA override
Text enclosed in |...| is passed straight to the tokenizer as whitespace-separated IPA tokens:
"Hello | Λ n Ι m o Κ | world"
License
- CoreML export: CC-BY-4.0 (inherits from the upstream NeMo model).
- Upstream weights: see nvidia/magpie_tts_multilingual_357m.
- Downloads last month
- -
Model tree for FluidInference/magpie-tts-multilingual-357m-coreml
Base model
nvidia/magpie_tts_multilingual_357m