Sonic provides controls for the speed, volume, and emotion of generated speech. These are available on play.cartesia.ai using the UI controls, by passing aDocumentation Index
Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
Use this file to discover all available pages before exploring further.
generation_config parameter, or by using SSML tags within the transcript.
Speed and volume controls
Guide the speed and volume of a TTS generation with thegeneration_config.speed and generation_config.volume parameters. These values are roughly a multiplier on the default — for example, 1.5 generates audio at 1.5x the default speed.
The speed of the generation, ranging from
0.6 to 1.5.The volume of the generation, ranging from
0.5 to 2.0.Emotion controls Beta
By default, the model interprets the emotional subtext in the provided transcript. Guide the emotion of a TTS generation — like a director providing guidance to an actor — using thegeneration_config.emotion parameter.
Emotion tags push the model to be more emotive, but only work when the emotion is consistent with the transcript. The mismatch below is unlikely to work well:
The emotional guidance for a generation, one of the emotions below.
neutral, angry, excited, content, sad, and scared.
The complete list of available emotions is: happy, excited, enthusiastic, elated, euphoric, triumphant, amazed, surprised, flirtatious, joking/comedic, curious, content, peaceful, serene, calm, grateful, affectionate, trust, sympathetic, anticipation, mysterious, angry, mad, outraged, frustrated, agitated, threatened, disgusted, contempt, envious, sarcastic, ironic, sad, dejected, melancholic, disappointed, hurt, guilty, bored, tired, rejected, nostalgic, wistful, apologetic, hesitant, insecure, confused, resigned, anxious, panicked, alarmed, scared, neutral, proud, confident, distant, skeptical, contemplative, determined.
The voices with the best emotional response are:
- Leo (id:
0834f3df-e650-4766-a20c-5a93a43aa6e3) - Jace (id:
6776173b-fd72-460d-89b3-d85812ee518d) - Kyle (id:
c961b81c-a935-4c17-bfb3-ba2239de8c2f) - Gavin (id:
f4a3a8e4-694c-4c45-9ca0-27caf97901b5) - Maya (id:
cbaf8084-f009-4838-a096-07ee2e6612b1) - Tessa (id:
6ccbfb76-1fc6-48f7-b71d-91ac6298247b) - Dana (id:
cc00e582-ed66-4004-8336-0175b85c85f6) - Marian (id:
26403c37-80c1-4a1a-8692-540551ca2ae5)
Nonverbalisms
Insert[laughter] in your transcript to make the model laugh. We plan to add more non-speech verbalisms like sighs and coughs in future releases.