メインコンテンツへスキップ
POST
/
tts
/
bytes
Text-to-Speech (Bytes)
curl --request POST \
  --url https://api.cartesia.ai/tts/bytes \
  --header 'Cartesia-Version: <cartesia-version>' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: <api-key>' \
  --data '
{
  "model_id": "sonic-3.5",
  "transcript": "<string>",
  "voice": {
    "mode": "id",
    "id": "<string>",
    "__experimental_controls": {
      "speed": 123,
      "emotion": []
    }
  },
  "output_format": {
    "container": "raw",
    "sample_rate": 123,
    "bit_rate": 123
  },
  "duration": 123,
  "speed": "normal"
}
'
"<string>"

承認

X-API-Key
string
header
必須

ヘッダー

Cartesia-Version
enum<string>
必須

API version header.

利用可能なオプション:
2024-11-13
:

"2024-11-13"

ボディ

application/json
model_id
enum<string>
必須

The ID of the model to use for the generation. See Models all options.

利用可能なオプション:
sonic-3.5,
sonic-3,
sonic-latest
:

"sonic-3.5"

transcript
string
必須
voice
TTSRequestIdSpecifier · object
必須
output_format
RawOutputFormat · object
必須
language
enum<string> | null

The language that the given voice should speak the transcript in.

利用可能なオプション:
en,
fr,
de,
es,
pt,
zh,
ja,
hi,
it,
ko,
nl,
pl,
ru,
sv,
tr
duration
number<double> | null

The maximum duration of the audio in seconds. You do not usually need to specify this. If the duration is not appropriate for the length of the transcript, the output audio may be truncated.

speed
enum<string> | null
デフォルト:normal
非推奨

Influences the speed of the generated speech. Faster speeds may reduce hallucination rate.

This feature is experimental and may not work for all voices.

利用可能なオプション:
slow,
normal,
fast

レスポンス

200 - audio/*

Audio bytes

The response is of type file.