Pro Voice Cloning (Beta)
Learn how to improve voice cloning by leveraging more of your data.

Pricing
Overview
Pro Voice Cloning is available in the playground for select scale and enterprise tier users. It allows you to create highly accurate voice clones by leveraging a larger amount of data compared to instant cloning. Contact support@cartesia.ai to enable it for your account.
We’ll first fine-tune a model on your data, then create Voices from selected clips of your data. These Voices are tied to the fine-tuned model and will be automatically routed to it for TTS generation.
Click on the Create
button to get started.

Read through the best practices and click Create
again to proceed to the workflow.
PVC Workflow
Prepare Data
Fill out the form with the requested info for the Voices you are about to create.

Next, you’ll create a dataset to upload your audio data. Click the Create Dataset
button to initialize a new dataset.

Then, upload all of the audio files you want to use for training using the Upload samples
button or by dragging them onto the designated area. You can upload multiple files at once. Files can be in any standard audio format.
We recommend uploading a minimum of 30 minutes of audio, but around 2 hours of audio is ideal. The Pro Voice Clone will closely match your uploaded data, so make sure it sounds the way you like in terms of background noise, loudness, and speech quality.

Accept the disclaimer and click the Train Pro Voice Clone
button to kick off training.

Train Model
Training should take up to 1 hour to complete. You’ll only be charged if the training is successful. If training fails, you can click the Retry
button to try again or contact support if the failures persist.
Test Voices
Once training is complete, we’ll automatically create 4 Voices based on different source audio clips from your dataset. These Voices are internally linked to your fine-tuned model, which will be used when you specify the model id or alias listed below in your requests.
The Voices are also available in your Library and can be used through the API.

Note about base model updates:
We’ve fine-tuned the latest base model available in production, which is reflected in the displayed model id. This means that the fine-tuned model is fixed to this particular model-id and will not be activated if you use a different model-id. PVCs will not automatically be updated for future base models, and will need to be retrained on each new base model.