Voice Cloning Guide
Voice cloning
For the best voice clones, we recommend following these practices in the source clip:
- Choose an appropriate transcript to speak. You want the transcript you record to align as closely as possible with the voice you want to generate. For example, don’t read a colorless transcript in a monotone voice unless you’re aiming for a monotonous clone. Instead, prepare a transcript that is suited to your use case and has the right energy.
- Speak as clearly as possible and avoid background noise. For example, when recording yourself, try to use a high-quality microphone, be in a quiet space, and so on.
- Limit your recording to 10 to 20 seconds. This is the sweet spot—a longer clip will not result in a better clone.
- Set
enhance
totrue
when cloning. This optimizes sample clip quality prior to voice cloning-—improving clone fidelity, especially for lower-quality samples. Note that this may increase overall volume and remove background noises. - Avoid long pauses in the clip. Too many long pauses could result in the cloned voice drifting from the source clip.
- Speak in the target language. For instance, if you want the cloned voice to speak Spanish, speak Spanish in the recording.