YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Chatterbox TTS with MLX on Apple Silicon ποΈ
High-quality voice synthesis with emotion control and voice cloning using Chatterbox Turbo on Apple Silicon.
Features
- π Fast inference on Apple Silicon (M1/M2/M3/M4) using MLX
- π 9 Emotion Tags - [laugh], [sigh], [gasp], [groan], [chuckle], [cough], [sniff], [shush], [clear throat]
- π΅ Voice Cloning from reference audio (6+ seconds)
- π Smart Text Chunking for long inputs
- π¨ Professional Web UI with Gradio
Quick Start
# 1. Install dependencies
pip install -r requirements.txt
# 2. Launch the web interface
./start_demo.sh
# Or run directly:
python3 app_chatterbox.py
# 3. Open in browser (auto-opens)
# URL: http://127.0.0.1:7861
If Already Running
# Check if running
pgrep -f app_chatterbox.py
# Stop the demo
pkill -f app_chatterbox.py
# Then start again
./start_demo.sh
Using the Web Interface
- Enter text with optional emotion tags like
[laugh]or[sigh] - Upload reference audio (optional) for voice cloning
- Click Generate to create speech
- Listen to the generated audio with emotions!
Command Line Usage
# Basic generation with emotions
python3 -m mlx_audio.tts.generate \
--model mlx-community/chatterbox-turbo-fp16 \
--text "[sigh] Monday again. [chuckle] But let's make the best of it!" \
--file_prefix output
# Voice cloning
python3 -m mlx_audio.tts.generate \
--model mlx-community/chatterbox-turbo-fp16 \
--text "Hello, this is your cloned voice!" \
--ref_audio my_voice.wav \
--file_prefix cloned
Supported Emotion Tags
| Tag | Effect |
|---|---|
[sigh] |
Sighing expression |
[groan] |
Groaning sound |
[gasp] |
Gasping reaction |
[laugh] |
Full laughter |
[chuckle] |
Light chuckling |
[cough] |
Coughing sound |
[sniff] |
Sniffing sound |
[shush] |
Shushing sound |
[clear throat] |
Throat clearing |
Example Audio
Check out chatterbox_full_story.wav - an 82-second story with multiple emotions!
Voice Cloning Tips
- Use clear audio - Record in a quiet environment
- 6+ seconds - Longer samples clone better
- Single speaker - Only one person speaking
- Good quality - WAV format recommended
Technical Details
- Model:
mlx-community/chatterbox-turbo-fp16(350M parameters) - Framework: MLX (Apple's machine learning framework)
- Sample Rate: 24kHz
- Pipeline: Chatterbox Turbo (not Kokoro)
Files
| File | Description |
|---|---|
app_chatterbox.py |
Main Gradio web interface |
chatterbox_voice_cloning.py |
CLI script for voice cloning |
chatterbox_emotions_demo.py |
Emotion examples demo |
requirements.txt |
Python dependencies |
chatterbox_full_story.wav |
Example output with emotions |
Installation
# Core packages
pip install mlx-audio[tts]>=0.2.8
pip install librosa gradio
# Or use requirements.txt
pip install -r requirements.txt
Requirements
- Hardware: Apple Silicon Mac (M1/M2/M3/M4)
- Python: 3.10+
- mlx-audio: 0.2.8+
- Dependencies: librosa, gradio, soundfile
Why This Works
The key insight: Use CLI module (python -m mlx_audio.tts.generate) instead of the Python API's generate_audio() function. The CLI properly routes to Chatterbox Turbo pipeline, while the Python function defaults to Kokoro.
You can verify it's using Chatterbox when you see:
S3 Token -> Mel Inference...
(Not "KokoroPipeline")
Credits
- Chatterbox Model: Resemble AI
- MLX Audio: Prince Canuma / Blaizzy
- MLX Framework: Apple
License
MIT License - Free to use and modify!
π Enjoy natural, expressive speech synthesis with emotion control on your Mac!
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support