Batch Text-to-Speech

Generate speech from text and save it as a WAV file.

Prerequisites

For this tutorial, you need a Cartesia API key in your shell environment. Get your API key at https://play.cartesia.ai/keys, then run this command or add it to your .bashrc or .zshrc:

export CARTESIA_API_KEY=<your api key here>

Generate a WAV file

Python
JavaScript
cURL

Install the SDK

pip install cartesia

Generate speech

generate_speech.py

from cartesia import Cartesia
import os
import sys

client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY"))

response = client.tts.generate(
    model_id="sonic-3.5",
    transcript="Hello, world! Welcome to Cartesia.",
    voice={"mode": "id", "id": "a0e99841-438c-4a64-b679-ae501e7d6091"},
    output_format={"container": "wav", "encoding": "pcm_f32le", "sample_rate": 44100},
)
sys.stdout.buffer.write(response.content)

Run it

python3 generate_speech.py | ffplay -nodisp -autoexit -loglevel quiet -
# Or save to a file:
python3 generate_speech.py > output.wav

Install the SDK

npm install @cartesia/cartesia-js

Generate speech

generate_speech.mjs

import Cartesia from "@cartesia/cartesia-js";

const client = new Cartesia({ apiKey: process.env["CARTESIA_API_KEY"] });

const response = await client.tts.generate({
  model_id: "sonic-3.5",
  transcript: "Hello, world! Welcome to Cartesia.",
  voice: { mode: "id", id: "a0e99841-438c-4a64-b679-ae501e7d6091" },
  output_format: { container: "wav", encoding: "pcm_f32le", sample_rate: 44100 },
});

process.stdout.write(Buffer.from(await response.arrayBuffer()));

Run it

node generate_speech.mjs | ffplay -nodisp -autoexit -loglevel quiet -
# Or save to a file:
node generate_speech.mjs > output.wav

Generate speech

generate_speech.sh

#!/usr/bin/env bash
curl -X POST "https://api.cartesia.ai/tts/bytes" \
  -H "Cartesia-Version: 2025-04-16" \
  -H "X-API-Key: $CARTESIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "sonic-3.5",
    "transcript": "Hello, world! Welcome to Cartesia.",
    "voice": {"mode": "id", "id": "a0e99841-438c-4a64-b679-ae501e7d6091"},
    "output_format": {"container": "wav", "encoding": "pcm_s16le", "sample_rate": 44100}
  }'

Run it

bash generate_speech.sh | ffplay -nodisp -autoexit -loglevel quiet -
# Or save to a file:
bash generate_speech.sh > output.wav

The voice used above can be found on the playground. Browse more voices at play.cartesia.ai/voices.

What’s next

Text-to-Speech Quickstart

Pipe LLM output to TTS in real time using WebSocket streaming.

Choose a Voice

Browse voices and learn how to pick the right one for your use case.

TTS output audio format

Pick the right output format, sample rate, and encoding for your use case.

Get Started

Text-to-Speech

Speech-to-Text

Tools

Integrations

Enterprise

Prerequisites

Generate a WAV file

What’s next

Text-to-Speech Quickstart

Choose a Voice

TTS output audio format

​Prerequisites

​Generate a WAV file

​What’s next

Text-to-Speech Quickstart

Choose a Voice

TTS output audio format

Prerequisites

Generate a WAV file

What’s next