Magpie TTS Multilingual 357M (CoreML)

CoreML export of NVIDIA's Magpie TTS Multilingual 357M, optimized for on-device inference on Apple Silicon. Ships both .mlmodelc (compiled, ready-to-run) and .mlpackage (portable source) for macOS 14+ / iOS 17+.

Converted with FluidInference/mobius. Consumed by the Swift port in FluidInference/FluidAudio (see Sources/FluidAudio/TTS/Magpie/).

Languages

English, Spanish, German, French, Italian, Vietnamese, Mandarin, Hindi. Japanese is not yet included. French / Italian / Vietnamese use ByT5 byte-level tokenization (algorithmic, no lookup files).

Contents

β”œβ”€β”€ manifest.json                       # machine-readable index (sha256, shapes, IO specs)
β”œβ”€β”€ text_encoder.{mlmodelc,mlpackage}   # text β†’ (1, 256, 768) encoder output
β”œβ”€β”€ decoder_step.{mlmodelc,mlpackage}   # 12-layer AR decoder (stateful KV cache)
β”œβ”€β”€ decoder_prefill.{mlmodelc,mlpackage}# batched prefill fast path
β”œβ”€β”€ nanocodec_decoder.{mlmodelc,mlpackage} # 8-codebook β†’ PCM vocoder, 22050 Hz, max 256 frames
β”œβ”€β”€ constants/
β”‚   β”œβ”€β”€ constants.json                  # d_model, n_layers, special token ids, sampling defaults
β”‚   β”œβ”€β”€ speaker_info.json               # 5 speakers, context shape (110, 768)
β”‚   β”œβ”€β”€ tokenizer_{info,metadata,references}.json
β”‚   β”œβ”€β”€ speaker_0.npy .. speaker_4.npy
β”‚   β”œβ”€β”€ speaker_embeddings_raw.npy
β”‚   β”œβ”€β”€ text_embedding.npy
β”‚   β”œβ”€β”€ audio_embedding_0.npy .. audio_embedding_7.npy   # per-codebook (2024, 768)
β”‚   └── local_transformer/              # 1-layer transformer weights (Swift reads .npy)
└── tokenizer/
    β”œβ”€β”€ english_phoneme_*.json          # phoneme_dict + token2id + heteronyms
    β”œβ”€β”€ spanish_phoneme_*.json          # phoneme_dict + token2id
    β”œβ”€β”€ german_phoneme_*.json           # phoneme_dict + token2id + heteronyms
    β”œβ”€β”€ mandarin_phoneme_*.json         # token2id, phoneme_dict, pinyin/tone/ascii dicts
    β”œβ”€β”€ mandarin_jieba_dict.json
    β”œβ”€β”€ mandarin_pypinyin_{char,phrase}_dict.json
    └── hindi_chartokenizer_token2id.json

Prefer .mlmodelc at runtime; .mlpackage is included for inspection and re-compilation. Consult manifest.json for the full asset table (file sizes, sha256, model IO shapes per layer).

Usage (Swift)

import FluidAudio

let manager = try await MagpieTtsManager.downloadAndCreate(
    languages: [.english, .spanish]
)
let result = try await manager.synthesize(
    text: "Hello | ˈ n Ι› m o ʊ | from FluidAudio.",
    speaker: .john,
    language: .english
)
let wav = AudioWAV.data(from: result.samples, sampleRate: result.sampleRate)
try wav.write(to: URL(fileURLWithPath: "hello.wav"))

The manager lazy-downloads everything in this repo on first use.

Inline IPA override

Text enclosed in |...| is passed straight to the tokenizer as whitespace-separated IPA tokens:

"Hello | ˈ n Ι› m o ʊ | world"

License

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for FluidInference/magpie-tts-multilingual-357m-coreml

Quantized
(1)
this model