This page explains how to configureDocumentation Index
Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
Use this file to discover all available pages before exploring further.
output_format for TTS responses (container, encoding, and sample_rate).
In general, use a consistent encoding and sample rate across your audio pipeline (telephony, playback, and storage) to avoid unnecessary transcoding and quality loss.
If you’re saving audio samples, we recommend using the Text-to-Speech (Bytes) API
with output_format.container: "wav" or output_format.container: "mp3" so audio players can automatically detect the encoding and sample rate.
Reference
The container format for the audio output.Available options:
RAW, WAV, MP3. Only the Bytes endpoint supports all container formats;
our other endpoints (SSE, Websockets) only support RAW.The encoding of the output audio. Available options:
pcm_f32le, pcm_s16le,
pcm_mulaw, pcm_alaw.The sample rate of the output audio. Remember that to represent a given signal, the sample rate
must be at least twice the highest frequency component of the signal (Nyquist theorem).Available options:
8000, 16000, 22050, 24000, 44100, 48000.output_format for RAW (PCM) Audio
When using raw audio, it is important to match the encoding and sample rate with your output device with the output_format parameter.
| Encoding | Bit depth | Commonly used for | Pair with sample rate |
|---|---|---|---|
pcm_s16le | 16-bit int | General-purpose playback, browsers, audio players, most devices | 16000-44100 |
pcm_f32le | 32-bit float | ML post-processing, high-fidelity recording, audio analysis | 48000 |
pcm_mulaw | 8-bit compressed | North American / Japanese telephony (G.711μ), Twilio | 8000 |
pcm_alaw | 8-bit compressed | European / international telephony (G.711A) | 8000 |
Audio CD quality
Standard audio CDs are encoded aspcm_s16le at 44.1 kHz sample rate.