Clone Voice from Clip

POST

Clone a voice from a clip. The clip should be a 15-20 second recording of a person speaking with little to no background noise.

The endpoint will return an embedding that can either be used directly with text-to-speech endpoints or used to create a new voice.

Auth

X-API-KeystringRequired

Cartesia-Version"2024-06-10"RequiredDefaults to 2024-06-10

This endpoint expects a multipart form containing a file.

clipfileRequired

enhancebooleanRequired

Whether to enhance the clip to improve its quality before cloning. Useful if the clip is low quality.

This endpoint returns an object.

embeddinglist of doubles

A 192-dimensional vector (i.e. a list of 192 numbers) that represents the voice.

1	curl -X POST https://api.cartesia.ai/voices/clone/clip \
2	-H "Cartesia-Version: 2024-06-10" \
3	-H "X-API-Key: <APIKeyHeader>" \
4	-H "Content-Type: multipart/form-data" \
5	-F clip=@<filename1> \
6	-F enhance='true'