Voice Changer (SSE) - Cartesia Docs

curl --request POST \ --url https://api.cartesia.ai/voice-changer/sse \ --header 'Cartesia-Version: <cartesia-version>' \ --header 'Content-Type: multipart/form-data' \ --header 'X-API-Key: <api-key>' \ --form clip='@example-file' \ --form 'voice[id]=<string>' \ --form 'output_format[sample_rate]=123' \ --form 'output_format[bit_rate]=123'

Authorizations

X-API-Key

string

header

required

Headers

Cartesia-Version

enum<string>

required

API version header.

Available options:

2024-06-10

Example:

"2024-06-10"

Body

multipart/form-data

clip

file

Supported audio formats: flac, mp3, mpeg, mpga, oga, ogg, wav, webm

voice[id]

string

output_format[container]

enum<string>

Available options:

raw,

wav,

mp3

output_format[sample_rate]

integer

The sample rate of the audio in Hz. Supported sample rates are 8000, 16000, 22050, 24000, 44100, 48000.

output_format[encoding]

enum<string> | null

Required for raw and wav containers.

Available options:

pcm_f32le,

pcm_s16le,

pcm_mulaw,

pcm_alaw

output_format[bit_rate]

integer | null

Required for mp3 containers.

Response

200 - text/event-stream

Server-sent events stream. Each frame is data: <json>\n\n where the JSON payload matches VoiceChangerSSEEvent.

VoiceChangerSSEChunk
VoiceChangerSSEDone
VoiceChangerSSEError

Audio data chunk.

status_code

enum<integer>

required

HTTP-style status code. Always 206 for chunk events.

Available options:

206

done

enum<boolean>

required

Whether this is the final event for the request. Always false for chunk events.

Available options:

false

data

string

required

Base64-encoded audio data.

sample_rate

integer

required

The sample rate of the audio in Hz.

step_time

number

required

Server-side processing time for this chunk in milliseconds.