Skip to main content

Prerequisites

  1. A Cartesia account.
  2. An API key.
  3. FFmpeg installed (optional but recommended).
FFmpeg isn’t required to use the Cartesia API, but it’s useful for saving, playing, and converting audio files, so we will use it in the examples below. You can install it using your platform’s package manager:
# macOS
brew install ffmpeg

# Debian/Ubuntu
sudo apt install ffmpeg

# Fedora
dnf install ffmpeg

# Arch Linux
sudo pacman -S ffmpeg

Generate your first words

  • cURL
  • Python
  • JavaScript/TypeScript
To generate your first words, run this command in your terminal, replacing YOUR_API_KEY:
curl -N -X POST "https://api.cartesia.ai/tts/bytes" \
        -H "Cartesia-Version: 2024-11-13" \
        -H "X-API-Key: YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{"transcript": "Welcome to Cartesia Sonic!", "model_id": "sonic-2", "voice": {"mode":"id", "id": "694f9389-aac1-45b6-b726-9d9369183238"}, "output_format":{"container":"wav", "encoding":"pcm_f32le", "sample_rate":44100}}' > sonic-2.wav
Make sure to replace YOUR_API_KEY with your real API key, or the command won’t output anything!
You can play the resulting sonic-2.wav file with afplay sonic-2.wav (on macOS) or ffplay sonic-2.wav (on any system with FFmpeg installed). You can also just double click it in your file explorer.This command calls the Text to Speech (Bytes) endpoint which runs the text-to-speech generation and transmits the output in raw bytes.
The bytes endpoint supports a variety of output formats, making it perfect for batch use cases where you want to save the audio in advance.In comparison, Cartesia’s WebSocket and Server-Sent Events endpoints stream out raw PCM audio to avoid latency overhead from transcoding the audio.
The voice used above can be found on the playground.
I