Clone Voice

Clone a high similarity voice from an audio clip. Clones are more similar to the source clip, but may reproduce background noise. For these, use an audio clip about 5 seconds long.

Headers

AuthorizationstringRequired

Bearer authentication of the form Bearer <token>, where token is your auth token.

Cartesia-Version"2025-04-16"Required

Request

This endpoint expects a multipart form containing a file.
clipfileRequired
namestringRequired
The name of the voice.
descriptionstringOptional
A description for the voice.
languageenumRequired
The language of the voice.
enhancebooleanOptional
Whether to apply AI enhancements to the clip to reduce background noise. This leads to cleaner generated speech at the cost of reduced similarity to the source clip.
base_voice_idstringOptional
Optional base voice ID that the cloned voice is derived from.

Response

This endpoint returns an object.
idstring
The ID of the voice.
user_idstring
The ID of the user who owns the voice.
is_publicboolean
Whether the voice is publicly accessible.
namestring
The name of the voice.
descriptionstring
The description of the voice.
created_atdatetime
The date and time the voice was created.
languageenum

The language that the given voice should speak the transcript in.

Options: English (en), French (fr), German (de), Spanish (es), Portuguese (pt), Chinese (zh), Japanese (ja), Hindi (hi), Italian (it), Korean (ko), Dutch (nl), Polish (pl), Russian (ru), Swedish (sv), Turkish (tr).