Clone Voice from Clip

POST

Clone a voice from a clip. The clip should be a 15-20 second recording of a person speaking with little to no background noise.

The endpoint will return an embedding that can either be used directly with text-to-speech endpoints or used to create a new voice.

Headers

Auth
X-API-KeystringRequired
Cartesia-Version"2024-06-10"RequiredDefaults to 2024-06-10

Request

This endpoint expects a multipart form containing a file.
clipfileRequired
enhancebooleanRequired

Whether to enhance the clip to improve its quality before cloning. Useful if the clip is low quality.

Response

This endpoint returns an object.
embeddinglist of doubles

A 192-dimensional vector (i.e. a list of 192 numbers) that represents the voice.