Skip to main content
POST
/
infill
/
bytes
Infill (Bytes)
curl --request POST \
  --url https://api.cartesia.ai/infill/bytes \
  --header 'Cartesia-Version: <cartesia-version>' \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key: <api-key>' \
  --form 'model_id=<string>' \
  --form 'language=<string>' \
  --form 'transcript=<string>' \
  --form 'voice_id=<string>' \
  --form 'output_format[container]=raw' \
  --form 'output_format[sample_rate]=123' \
  --form 'output_format[encoding]=pcm_f32le' \
  --form 'output_format[bit_rate]=123' \
  --form 'voice[__experimental_controls][speed]=123' \
  --form 'voice[__experimental_controls][emotion][]=anger:lowest' \
  --form left_audio=@example-file \
  --form right_audio=@example-file

Authorizations

X-API-Key
string
header
required

Headers

Cartesia-Version
enum<string>
required

API version header. Must be set to the API version, e.g. '2024-06-10'.

Available options:
2024-06-10,
2024-11-13,
2025-04-16
Example:

"2024-06-10"

Body

multipart/form-data
left_audio
file
right_audio
file
model_id
string

The ID of the model to use for generating audio

language
string

The language of the transcript

transcript
string

The infill text to generate

voice_id
string

The ID of the voice to use for generating audio

output_format[container]
enum<string>

The format of the output audio

Available options:
raw,
wav,
mp3
output_format[sample_rate]
integer

The sample rate of the output audio in Hz. Supported sample rates are 8000, 16000, 22050, 24000, 44100, 48000.

output_format[encoding]
enum<string>

Required for raw and wav containers.

Available options:
pcm_f32le,
pcm_s16le,
pcm_mulaw,
pcm_alaw
output_format[bit_rate]
integer | null

Required for mp3 containers.

voice[__experimental_controls][speed]

Either a number between -1.0 and 1.0 or a natural language description of speed.

If you specify a number, 0.0 is the default speed, -1.0 is the slowest speed, and 1.0 is the fastest speed.

voice[__experimental_controls][emotion][]
enum<string>[] | null

An array of emotion:level tags.

Supported emotions are: anger, positivity, surprise, sadness, and curiosity.

Supported levels are: lowest, low, (omit), high, highest.

I