Clone Voice

POST

Clone a voice from an audio clip. This endpoint has two modes, stability and similarity.

Similarity mode clones are more similar to the source clip, but may reproduce background noise. For these, use an audio clip about 5 seconds long.

Stability mode clones are more stable, but may not sound as similar to the source clip. For these, use an audio clip 10-20 seconds long.

Headers

Auth
X-API-KeystringRequired
Cartesia-Version"2024-06-10"Required

Request

This endpoint expects a multipart form containing a file.
clipfileRequired
namestringRequired

The name of the voice.

descriptionstringOptional

A description for the voice.

languageenumRequired

The language of the voice.

mode"similarity" or "stability"Required

Tradeoff between similarity and stability. Similarity clones sound more like the source clip, but may reproduce background noise. Stability clones always sound like a studio recording, but may not sound as similar to the source clip.

Allowed values: similaritystability
enhancebooleanRequired

Whether to enhance the clip to improve its quality before cloning. Useful if the clip has background noise.

transcriptstringOptional

Optional transcript of the words spoken in the audio clip. Only used for similarity mode.

Response

This endpoint returns an object.
idstring

The ID of the voice.

user_idstring

The ID of the user who owns the voice.

is_publicboolean

Whether the voice is publicly accessible.

namestring

The name of the voice.

descriptionstring

The description of the voice.

created_atdatetime

The date and time the voice was created.

languageenum

The language that the given voice should speak the transcript in.

Options: English (en), French (fr), German (de), Spanish (es), Portuguese (pt), Chinese (zh), Japanese (ja), Hindi (hi), Italian (it), Korean (ko), Dutch (nl), Polish (pl), Russian (ru), Swedish (sv), Turkish (tr).