Learn how to get the best voice clones from your audio clips.

Voice cloning is available through the playground and the API.

For the best voice clones, we recommend following these practices in the source clip:

  1. Choose an appropriate transcript to speak. You want the transcript you record to align as closely as possible with the voice you want to generate. For example, don’t read a colorless transcript in a monotone voice unless you’re aiming for a monotonous clone. Instead, prepare a transcript that is suited to your use case and has the right energy.
  2. Speak as clearly as possible and avoid background noise. For example, when recording yourself, try to use a high-quality microphone, be in a quiet space, and so on.
  3. Limit your recording to 10 to 20 seconds. This is the sweet spot—a longer clip will not result in a better clone.
  4. Set enhance to true when cloning. This optimizes sample clip quality prior to voice cloning-—improving clone fidelity, especially for lower-quality samples. Note that this may increase overall volume and remove background noises.
  5. Avoid long pauses in the clip. Too many long pauses could result in the cloned voice drifting from the source clip.
  6. Speak in the target language. For instance, if you want the cloned voice to speak Spanish, speak Spanish in the recording. If this is not possible, you can use Cartesia’s localization feature—available in the playground and in the API—to convert your clone to a different language.