Skip to main content
On June 1, 2026, we are discontinuing our voice embedding (aka stability) TTS models. If you are currently making generation requests with voice embeddings like this:
{
  "voice": {
    "mode": "embedding",
    "embedding": [1, 2, ..., 3, 4]
  },
  "model_id": "sonic-2",
  // ...
}
You will need to switch to using voice IDs like this:
{
  "voice": {
    "mode": "id",
    "id": "e07c00bc-4134-4eae-9ea4-1a55fb45746b"
  },
  "model_id": "sonic-2",
  // ...
}
If you already use voice IDs, see Migrating Voices to make sure your voices will continue to work after the change. For an overview of all changes, see API Changes.

Get a voice ID

Choose one of the following options.

Check out the voice library

Our featured voices have all gone through rigorous evaluations and are ready to use in production. Check them out at play.cartesia.ai/voices and copy the ID of any voice you’d like to use.

Clone a voice

If you have source audio, create a cloned voice via the playground or the API. Cloning returns a voice ID you can use immediately.

Generate source audio from your existing embedding

If you no longer have the original audio clip used to create your embedding, generate a short sample with sonic or sonic-2 and then clone a new voice. You can do this on our playground:
  1. play.cartesia.ai/text-to-speech
  2. play.cartesia.ai/voices/create/clone
Or with our API:
  1. Text to Speech (Bytes)
  2. Clone Voice
Here is an example using our SDK:
from cartesia import Cartesia

# inputs
your_api_key: str = ""

your_voice_embedding: list[float] = []

language = "en"

transcript = """
It's nice to meet you.
Hope you're having a great day!
Could we reschedule our meeting tomorrow?
Please call me back as soon as possible.
"""

source_tts_model_id = "sonic"

client = Cartesia(api_key=your_api_key)

# Step 1: generate an audio sample
print(f"Generating audio sample {source_tts_model_id=}")
source_audio_iterator = client.tts.bytes(
    voice={"mode": "embedding", "embedding": your_voice_embedding},
    model_id=source_tts_model_id,
    transcript=transcript,
    language=language,
    output_format={
        "container": "wav",
        "encoding": "pcm_f32le",
        "sample_rate": 44100
    },
)

# Step 2: clone a voice
print("Cloning a voice")
voice = client.voices.clone(
    name="My Voice",
    language=language,
    clip=b"".join(source_audio_iterator),
    mode="similarity",
)
print(f"Cloned voice {voice.id}")

# you can now use the voice like this
migrate_to_model = "sonic-3"
generated_sample_file_name = f"{migrate_to_model}_{voice.id}.wav"

cloned_audio_iterator = client.tts.bytes(
    voice={"mode": "id", "id": voice.id},
    model_id=migrate_to_model,
    transcript=transcript,
    language=language,
    output_format={
        "container": "wav",
        "encoding": "pcm_f32le",
        "sample_rate": 44100
    },
)
with open(generated_sample_file_name, "wb") as f:
    for chunk in cloned_audio_iterator:
        f.write(chunk)
print(f"Listen to your new voice: {generated_sample_file_name}")

try:
    import subprocess

    subprocess.run(
        [
            "ffplay",
            "-loglevel",
            "quiet",
            "-autoexit",
            "-nodisp",
            generated_sample_file_name,
        ]
    )
except FileNotFoundError:
    pass

Using Voice IDs

See TTS (Bytes), TTS (SSE), and TTS (WebSocket) for full API documentation. You can test these API changes by setting your Cartesia Version to 2026-03-01. We recommend upgrading your Cartesia Version on production traffic before June 1 to make sure nothing breaks.