cartesia-mcp package exposes Cartesia through the Model Context Protocol (MCP) so MCP-capable clients—Cursor, Claude Code, Codex, and similar—can list voices, run TTS and STT, manage pronunciation dictionaries, clone voices, and more without custom scripts.
Requirements
- uv — runs the server via
uvxwith no global install - Python 3.13+ (installed automatically by
uvx) - A Cartesia API key (format
sk_car_…)
Setup
Get an API key, then connect cartesia-mcp to your agent.- CLI (recommended)
- Cursor
- Claude Code
Try it
Ask your agent things like:- List all available Cartesia voices
- Convert text to audio with a chosen voice (speed, volume, emotion)
- Transcribe an audio file to text
- Create a pronunciation dictionary and use it in TTS
- Check credit usage for your account
- Localize an existing voice into another language
- Change an audio file to use a different voice
Tools
| Tool | Description |
|---|---|
text_to_speech | Convert text to audio; optional speed, volume, emotion, and pronunciation dict |
speech_to_text | Stream-transcribe an audio file via STT WebSocket (ink-2) |
list_voices | List available voices (filter by language, search, gender, etc.) |
get_voice | Fetch metadata for a voice by ID |
clone_voice | Clone a voice from an audio sample |
update_voice | Update a cloned voice’s name or description |
delete_voice | Delete a cloned voice |
voice_change | Re-render audio with a different voice |
localize_voice | Adapt a voice to another language or dialect |
list_pronunciation_dicts | List pronunciation dictionaries |
create_pronunciation_dict | Create a pronunciation dictionary |
get_pronunciation_dict | Get a pronunciation dictionary by ID |
update_pronunciation_dict | Update a pronunciation dictionary |
delete_pronunciation_dict | Delete a pronunciation dictionary |
get_credit_usage | Credit usage over time (admin API key) |
Advanced configuration
Advanced configuration
Output directory
By default, generated audio is written to the server’s working directory. To choose a fixed folder, addOUTPUT_DIRECTORY to env:Local audio files
Tools likespeech_to_text and voice_change need paths to existing audio files on disk. Pass the full path to each file when prompting your agent. For speech_to_text, use a mono PCM WAV file (or raw PCM with encoding and sample_rate).Admin API key
Some tools call management endpoints that accept admin API keys only (sk_car_admin_...). To use get_credit_usage, set CARTESIA_ADMIN_API_KEY in env in addition to CARTESIA_API_KEY. Admin keys work only on management routes; API keys from play.cartesia.ai/keys do not work on those routes, and admin keys do not work on generation routes.Mint admin keys in the Playground under Keys → Admin (organization admins only).API version
Cartesia MCP is built usingCartesia-Version: 2026-03-01.cartesia-mcp
The official Cartesia MCP Server