Skip to main content
POST
/
voices
/
clone
Clone Voice
curl --request POST \
  --url https://api.cartesia.ai/voices/clone \
  --header 'Cartesia-Version: <cartesia-version>' \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key: <api-key>' \
  --form 'name=<string>' \
  --form 'description=<string>' \
  --form language=en \
  --form mode=similarity \
  --form enhance=true \
  --form 'base_voice_id=<string>' \
  --form clip=@example-file
{
  "id": "<string>",
  "user_id": "<string>",
  "is_public": true,
  "name": "<string>",
  "description": "<string>",
  "created_at": "2023-11-07T05:31:56Z",
  "language": "en"
}

Authorizations

X-API-Key
string
header
required

Headers

Cartesia-Version
enum<string>
required

API version header. Must be set to the API version, e.g. '2024-06-10'.

Available options:
2024-06-10,
2024-11-13,
2025-04-16
Example:

"2024-06-10"

Body

multipart/form-data
clip
file
name
string

The name of the voice.

description
string | null

A description for the voice.

language
enum<string>

The language of the voice.

Available options:
en,
fr,
de,
es,
pt,
zh,
ja,
hi,
it,
ko,
nl,
pl,
ru,
sv,
tr
mode
enum<string>

Tradeoff between similarity and stability. Similarity clones sound more like the source clip, but may reproduce background noise. Stability clones always sound like a studio recording, but may not sound as similar to the source clip.

Available options:
similarity,
stability
enhance
boolean | null

Whether to apply AI enhancements to the clip to reduce background noise. This leads to cleaner generated speech at the cost of reduced similarity to the source clip.

base_voice_id
string

Optional base voice ID that the cloned voice is derived from.

Response

200 - application/json
id
string
required

The ID of the voice.

user_id
string
required

The ID of the user who owns the voice.

is_public
boolean
required

Whether the voice is publicly accessible.

name
string
required

The name of the voice.

description
string
required

The description of the voice.

created_at
string<date-time>
required

The date and time the voice was created.

language
enum<string>
required

The language that the given voice should speak the transcript in.

Options: English (en), French (fr), German (de), Spanish (es), Portuguese (pt), Chinese (zh), Japanese (ja), Hindi (hi), Italian (it), Korean (ko), Dutch (nl), Polish (pl), Russian (ru), Swedish (sv), Turkish (tr).

Available options:
en,
fr,
de,
es,
pt,
zh,
ja,
hi,
it,
ko,
nl,
pl,
ru,
sv,
tr
I