generation_config parameter, or by using SSML tags within the transcript itself.
Sonic-3 interprets these parameters as guidance instead of as strict
adjustments to ensure natural speech, so we recommend testing them against
your content to ensure the output matches your expectations.
Speed and Volume Controls
You can guide the speed and volume of a TTS generation with thegeneration_config.speed and
generation_config.volume parameters.
These values are roughly a multiplier on the default speed and volume, eg, 1.5 will generate audio
at 1.5x the default speed.
The speed of the generation, ranging from
0.6 to 1.5.The volume of the generation, ranging from
0.5 to 2.0.<speed ratio="1.5"/> I like to speak quickly because it makes me sound smart. <volume level="1.5"/> And I can be loud, too!
Emotion Controls Beta
By default, the model attempts to interpret the emotional subtext present in the provided transcript. You can guide the emotion of a TTS generation, like a director providing guidance to an actor, using thegeneration_config.emotion parameter.
Emotion tags are good to push the model to be more emotive, but they only work
when the emotion is consistent with transcript. For instance `I’m so excited!` is unlikely to work well.
The emotional guidance for a generation, one of the emotions below.
neutral,
angry, excited, content, sad, and scared.
The complete list of available emotions is: happy, excited, enthusiastic, elated, euphoric, triumphant, amazed, surprised, flirtatious, joking/comedic, curious, content, peaceful, serene, calm, grateful, affectionate, trust, sympathetic, anticipation, mysterious, angry, mad, outraged, frustrated, agitated, threatened, disgusted, contempt, envious, sarcastic, ironic, sad, dejected, melancholic, disappointed, hurt, guilty, bored, tired, rejected, nostalgic, wistful, apologetic, hesitant, insecure, confused, resigned, anxious, panicked, alarmed, scared, neutral, proud, confident, distant, skeptical, contemplative, determined.
The Voices with the best emotional response are:
- Leo (id:
0834f3df-e650-4766-a20c-5a93a43aa6e3) - Jace (id:
6776173b-fd72-460d-89b3-d85812ee518d) - Kyle (id:
c961b81c-a935-4c17-bfb3-ba2239de8c2f) - Gavin (id:
f4a3a8e4-694c-4c45-9ca0-27caf97901b5) - Maya (id:
cbaf8084-f009-4838-a096-07ee2e6612b1) - Tessa (id:
6ccbfb76-1fc6-48f7-b71d-91ac6298247b) - Dana (id:
cc00e582-ed66-4004-8336-0175b85c85f6) - Marian (id:
26403c37-80c1-4a1a-8692-540551ca2ae5)
<emotion value="angry" /> How dare you speak to me like I'm just a robot!
Nonverbalisms
Insert[laughter]in your transcript to make the model laugh. In the future we plan to add more non-speech verbalisms like sighs and coughs.