Text to Speech (SSE)
Headers
Authorization
Bearer authentication of the form Bearer <token>, where token is your auth token.
Cartesia-Version
Request
This endpoint expects an object.
model_id
The ID of the model to use for the generation. See Models for available models.
transcript
voice
output_format
language
The language that the given voice should speak the transcript in.
Options: English (en), French (fr), German (de), Spanish (es), Portuguese (pt), Chinese (zh), Japanese (ja), Hindi (hi), Italian (it), Korean (ko), Dutch (nl), Polish (pl), Russian (ru), Swedish (sv), Turkish (tr).
duration
The maximum duration of the audio in seconds. You do not usually need to specify this.
If the duration is not appropriate for the length of the transcript, the output audio may be truncated.
speed
This feature is experimental and may not work for all voices.
Speed setting for the model. Defaults to normal
.
Influences the speed of the generated speech. Faster speeds may reduce hallucination rate.
Allowed values:
pronunciation_dict_ids
A list of pronunciation dict IDs to use for the generation.
Response
This endpoint returns a stream of object.
chunk
OR
flush_done
OR
done
OR
timestamps
OR
error
OR
phoneme_timestamps