YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Chatterbox TTS with MLX on Apple Silicon 🎙️

High-quality voice synthesis with emotion control and voice cloning using Chatterbox Turbo on Apple Silicon.

Features

🚀 Fast inference on Apple Silicon (M1/M2/M3/M4) using MLX
🎭 9 Emotion Tags - [laugh], [sigh], [gasp], [groan], [chuckle], [cough], [sniff], [shush], [clear throat]
🎵 Voice Cloning from reference audio (6+ seconds)
📝 Smart Text Chunking for long inputs
🎨 Professional Web UI with Gradio

Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Launch the web interface
./start_demo.sh

# Or run directly:
python3 app_chatterbox.py

# 3. Open in browser (auto-opens)
# URL: http://127.0.0.1:7861

If Already Running

# Check if running
pgrep -f app_chatterbox.py

# Stop the demo
pkill -f app_chatterbox.py

# Then start again
./start_demo.sh

Using the Web Interface

Enter text with optional emotion tags like [laugh] or [sigh]
Upload reference audio (optional) for voice cloning
Click Generate to create speech
Listen to the generated audio with emotions!

Command Line Usage

# Basic generation with emotions
python3 -m mlx_audio.tts.generate \
  --model mlx-community/chatterbox-turbo-fp16 \
  --text "[sigh] Monday again. [chuckle] But let's make the best of it!" \
  --file_prefix output

# Voice cloning
python3 -m mlx_audio.tts.generate \
  --model mlx-community/chatterbox-turbo-fp16 \
  --text "Hello, this is your cloned voice!" \
  --ref_audio my_voice.wav \
  --file_prefix cloned

Supported Emotion Tags

Tag	Effect
`[sigh]`	Sighing expression
`[groan]`	Groaning sound
`[gasp]`	Gasping reaction
`[laugh]`	Full laughter
`[chuckle]`	Light chuckling
`[cough]`	Coughing sound
`[sniff]`	Sniffing sound
`[shush]`	Shushing sound
`[clear throat]`	Throat clearing

Example Audio

Check out chatterbox_full_story.wav - an 82-second story with multiple emotions!

Voice Cloning Tips

Use clear audio - Record in a quiet environment
6+ seconds - Longer samples clone better
Single speaker - Only one person speaking
Good quality - WAV format recommended

Technical Details

Model: mlx-community/chatterbox-turbo-fp16 (350M parameters)
Framework: MLX (Apple's machine learning framework)
Sample Rate: 24kHz
Pipeline: Chatterbox Turbo (not Kokoro)

Files

File	Description
`app_chatterbox.py`	Main Gradio web interface
`chatterbox_voice_cloning.py`	CLI script for voice cloning
`chatterbox_emotions_demo.py`	Emotion examples demo
`requirements.txt`	Python dependencies
`chatterbox_full_story.wav`	Example output with emotions

Installation

# Core packages
pip install mlx-audio[tts]>=0.2.8
pip install librosa gradio

# Or use requirements.txt
pip install -r requirements.txt

Requirements

Hardware: Apple Silicon Mac (M1/M2/M3/M4)
Python: 3.10+
mlx-audio: 0.2.8+
Dependencies: librosa, gradio, soundfile

Why This Works

The key insight: Use CLI module (python -m mlx_audio.tts.generate) instead of the Python API's generate_audio() function. The CLI properly routes to Chatterbox Turbo pipeline, while the Python function defaults to Kokoro.

You can verify it's using Chatterbox when you see:

S3 Token -> Mel Inference...

(Not "KokoroPipeline")

Credits

Chatterbox Model: Resemble AI
MLX Audio: Prince Canuma / Blaizzy
MLX Framework: Apple

License

MIT License - Free to use and modify!

🎉 Enjoy natural, expressive speech synthesis with emotion control on your Mac!

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support