Skip to main content
POST
/
tts
/
bytes
Text to Speech (Bytes)
curl --request POST \
  --url https://api.cartesia.ai/tts/bytes \
  --header 'Authorization: Bearer <token>' \
  --header 'Cartesia-Version: <cartesia-version>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model_id": "<string>",
  "transcript": "<string>",
  "voice": {
    "mode": "id",
    "id": "<string>"
  },
  "language": "en",
  "generation_config": {
    "volume": 1,
    "speed": 1,
    "emotion": "neutral"
  },
  "output_format": {
    "container": "raw",
    "encoding": "pcm_f32le",
    "sample_rate": 8000
  },
  "save": false,
  "pronunciation_dict_id": "<string>",
  "speed": "normal"
}'
This response does not have an example.

Authorizations

Authorization
string
header
required

An Access Token

Headers

Cartesia-Version
enum<string>
required

API version header. Must be set to the API version, e.g. '2024-06-10'.

Available options:
2024-06-10,
2024-11-13,
2025-04-16
Example:

"2025-04-16"

Body

application/json
model_id
string
required

The ID of the model to use for the generation. See Models for available models.

transcript
string
required
voice
object
required
output_format
object
required
  • RawOutputFormat
  • WAVOutputFormat
  • MP3OutputFormat
language
enum<string>

The language that the given voice should speak the transcript in. For valid options, see Models.

Available options:
en,
fr,
de,
es,
pt,
zh,
ja,
hi,
it,
ko,
nl,
pl,
ru,
sv,
tr,
tl,
bg,
ro,
ar,
cs,
el,
fi,
hr,
ms,
sk,
da,
ta,
uk,
hu,
no,
vi,
bn,
th,
he,
ka,
id,
te,
gu,
kn,
ml,
mr,
pa
generation_config
object

Configure the various attributes of the generated speech. These are only for sonic-3 and have no effect on earlier models.

See Volume, Speed, and Emotion in Sonic-3 for a guide on this option.

save
boolean | null
default:false

Whether to save the generated audio file. When true, the response will include a Cartesia-File-ID header.

pronunciation_dict_id
string | null

The ID of a pronunciation dictionary to use for the generation. Pronunciation dictionaries are supported by sonic-3 models and newer.

speed
enum<string>
default:normal
deprecated

Use generation_config.speed for sonic-3. Speed setting for the model. Defaults to normal. This feature is experimental and may not work for all voices. Influences the speed of the generated speech. Faster speeds may reduce hallucination rate.

Available options:
slow,
normal,
fast

Response

OK

The response is of type file.