Make an API request

Generate your first words and learn API conventions

Prerequisites

  1. A Cartesia account.
  2. An API key.
  3. FFmpeg installed (optional but recommended).

FFmpeg isn’t required to use the Cartesia API, but it’s useful for saving, playing, and converting audio files, so we will use it in the examples below. You can install it using your platform’s package manager:

1# macOS
2brew install ffmpeg
3
4# Debian/Ubuntu
5sudo apt install ffmpeg
6
7# Fedora
8dnf install ffmpeg
9
10# Arch Linux
11sudo pacman -S ffmpeg

Generate your first words

To generate your first words, run this command in your terminal, replacing YOUR_API_KEY:

$curl -N -X POST "https://api.cartesia.ai/tts/bytes" \
> -H "Cartesia-Version: 2024-06-10" \
> -H "X-API-Key: YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{"transcript": "Welcome to Cartesia Sonic!", "model_id": "sonic-english", "voice": {"mode":"id", "id": "694f9389-aac1-45b6-b726-9d9369183238"}, "output_format":{"container":"wav", "encoding":"pcm_f32le", "sample_rate":44100}}' > sonic.wav

Make sure to replace YOUR_API_KEY with your real API key, or the command won’t output anything!

You can play the resulting sonic.wav file with afplay sonic.wav (on macOS) or ffplay sonic.wav (on any system with FFmpeg installed). You can also just double click it in your file explorer.

This command calls the Text to Speech (Bytes) endpoint which runs the text-to-speech generation and transmits the output in raw bytes.

The bytes endpoint supports a variety of output formats, making it perfect for batch use cases where you want to save the audio in advance.

In comparison, Cartesia’s WebSocket and Server-Sent Events endpoints stream out raw PCM audio to avoid latency overhead from transcoding the audio.

The voice used above can be found on the playground.