Skip to main content
Generate speech from text and save it as a WAV file.

Prerequisites

For this tutorial, you need a Cartesia API key in your shell environment. Get your API key at https://play.cartesia.ai/keys, then run this command or add it to your .bashrc or .zshrc:
export CARTESIA_API_KEY=<your api key here>

Generate a WAV file

1

Install the SDK

pip install cartesia
2

Generate speech

generate_speech.py
from cartesia import Cartesia
import os
import sys

client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY"))

response = client.tts.generate(
    model_id="sonic-3.5",
    transcript="Hello, world! Welcome to Cartesia.",
    voice={"mode": "id", "id": "694f9389-aac1-45b6-b726-9d9369183238"},
    output_format={"container": "wav", "encoding": "pcm_f32le", "sample_rate": 44100},
)
sys.stdout.buffer.write(response.content)
3

Run it

python3 generate_speech.py | ffplay -nodisp -autoexit -loglevel quiet -
# Or save to a file:
python3 generate_speech.py > output.wav
The voice used above can be found on the playground. Browse more voices at play.cartesia.ai/voices.

What’s next

Text-to-Speech Quickstart

Pipe LLM output to TTS in real time using WebSocket streaming.

Choose a Voice

Browse voices and learn how to pick the right one for your use case.

TTS output audio format

Pick the right output format, sample rate, and encoding for your use case.