YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Chatterbox TTS with MLX on Apple Silicon πŸŽ™οΈ

High-quality voice synthesis with emotion control and voice cloning using Chatterbox Turbo on Apple Silicon.

Features

  • πŸš€ Fast inference on Apple Silicon (M1/M2/M3/M4) using MLX
  • 🎭 9 Emotion Tags - [laugh], [sigh], [gasp], [groan], [chuckle], [cough], [sniff], [shush], [clear throat]
  • 🎡 Voice Cloning from reference audio (6+ seconds)
  • πŸ“ Smart Text Chunking for long inputs
  • 🎨 Professional Web UI with Gradio

Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Launch the web interface
./start_demo.sh

# Or run directly:
python3 app_chatterbox.py

# 3. Open in browser (auto-opens)
# URL: http://127.0.0.1:7861

If Already Running

# Check if running
pgrep -f app_chatterbox.py

# Stop the demo
pkill -f app_chatterbox.py

# Then start again
./start_demo.sh

Using the Web Interface

  1. Enter text with optional emotion tags like [laugh] or [sigh]
  2. Upload reference audio (optional) for voice cloning
  3. Click Generate to create speech
  4. Listen to the generated audio with emotions!

Command Line Usage

# Basic generation with emotions
python3 -m mlx_audio.tts.generate \
  --model mlx-community/chatterbox-turbo-fp16 \
  --text "[sigh] Monday again. [chuckle] But let's make the best of it!" \
  --file_prefix output

# Voice cloning
python3 -m mlx_audio.tts.generate \
  --model mlx-community/chatterbox-turbo-fp16 \
  --text "Hello, this is your cloned voice!" \
  --ref_audio my_voice.wav \
  --file_prefix cloned

Supported Emotion Tags

Tag Effect
[sigh] Sighing expression
[groan] Groaning sound
[gasp] Gasping reaction
[laugh] Full laughter
[chuckle] Light chuckling
[cough] Coughing sound
[sniff] Sniffing sound
[shush] Shushing sound
[clear throat] Throat clearing

Example Audio

Check out chatterbox_full_story.wav - an 82-second story with multiple emotions!

Voice Cloning Tips

  1. Use clear audio - Record in a quiet environment
  2. 6+ seconds - Longer samples clone better
  3. Single speaker - Only one person speaking
  4. Good quality - WAV format recommended

Technical Details

  • Model: mlx-community/chatterbox-turbo-fp16 (350M parameters)
  • Framework: MLX (Apple's machine learning framework)
  • Sample Rate: 24kHz
  • Pipeline: Chatterbox Turbo (not Kokoro)

Files

File Description
app_chatterbox.py Main Gradio web interface
chatterbox_voice_cloning.py CLI script for voice cloning
chatterbox_emotions_demo.py Emotion examples demo
requirements.txt Python dependencies
chatterbox_full_story.wav Example output with emotions

Installation

# Core packages
pip install mlx-audio[tts]>=0.2.8
pip install librosa gradio

# Or use requirements.txt
pip install -r requirements.txt

Requirements

  • Hardware: Apple Silicon Mac (M1/M2/M3/M4)
  • Python: 3.10+
  • mlx-audio: 0.2.8+
  • Dependencies: librosa, gradio, soundfile

Why This Works

The key insight: Use CLI module (python -m mlx_audio.tts.generate) instead of the Python API's generate_audio() function. The CLI properly routes to Chatterbox Turbo pipeline, while the Python function defaults to Kokoro.

You can verify it's using Chatterbox when you see:

S3 Token -> Mel Inference...

(Not "KokoroPipeline")

Credits

License

MIT License - Free to use and modify!


πŸŽ‰ Enjoy natural, expressive speech synthesis with emotion control on your Mac!

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support