Text to Speech (Bytes)

POST

Headers

Auth

X-API-KeystringRequired

Cartesia-Version"2024-06-10"Required

Request

This endpoint expects an object.

model_idstringRequired

The ID of the model to use for the generation. See Models for available models.

transcriptstringRequired

voiceobjectRequired

output_formatobjectRequired

languageenumOptional

The language that the given voice should speak the transcript in.

Options: English (en), French (fr), German (de), Spanish (es), Portuguese (pt), Chinese (zh), Japanese (ja), Hindi (hi), Italian (it), Korean (ko), Dutch (nl), Polish (pl), Russian (ru), Swedish (sv), Turkish (tr).

durationdoubleOptional

The maximum duration of the audio in seconds. You do not usually need to specify this. If the duration is not appropriate for the length of the transcript, the output audio may be truncated.

Response

This endpoint returns a file.

Text to Speech (SSE)

Up Next

Built with

1	curl -X POST https://api.cartesia.ai/tts/bytes \
2	-H "Cartesia-Version: 2024-06-10" \
3	-H "X-API-Key: <apiKey>" \
4	-H "Content-Type: application/json" \
5	-d '{
6	"model_id": "sonic-english",
7	"transcript": "Hello, world!",
8	"voice": {
9	"mode": "id",
10	"id": "694f9389-aac1-45b6-b726-9d9369183238"
11	},
12	"output_format": {
13	"container": "mp3",
14	"bit_rate": 128000,
15	"sample_rate": 44100
16	},
17	"language": "en"
18	}'