Skip to main content
Tags for volume, speed, and emotions is in beta and subject to change in the future.
Sonic-3 supports SSML-like (Speech Synthesis Markup Language) tags to control generated speech.

Speed

You can guide the speed of a TTS generation with a speed tag, which takes a scalar between 0.6 and 1.5. This value is roughly a multiplier on the default speed. For example, 1.5 will generate audio at roughly 1.5x the default speed.
<speed ratio="1.5"/> I like to speak quickly because it makes me sound smart.

Volume

You can guide the volume of a TTS generation with a volume tag, which is a number between 0.5 and 2.0. The default volume is 1.
<volume level="0.5"/> I will speak softly.

Emotion Beta

Emotion control is highly experimental, particularly when emotion shifts occur mid-generation. For best results, use voices tagged as “Emotive”, as emotions may not work reliably with other voice types.
<emotion value="angry" /> I will not allow you to continue this! <emotion value="sad" /> I was hoping for a peaceful resolution.

Pauses and breaks

To insert breaks (or pauses) in generated speech, use a break tags with one attribute, time. For example, <break time="1s" />. You can specify the time in seconds (s) or milliseconds (ms). For accounting purposes, these tags are considered 1 character and do not need to be separated with adjacent text using a space — to save credits you can remove spaces around break tags.
Hello, my name is Sonic.<break time="1s"/>Nice to meet you.

Spelling out numbers and letters

To spell out input text, you can wrap it in <spell> tags. This is particularly useful for pronouncing long numbers or identifiers, such as credit card numbers, phone numbers, or unique IDs.
My name is Bob, spelled <spell>Bob</spell>, my account number is <spell>ABC-123</spell>, my phone number is <spell>(123) 456-7890</spell>, and my credit card is <spell>1234-5678-9012-3456</spell>.
If you want to spell out numbers or identifiers and have planned breaks between the generations (e.g. taking a break between the area code of a phone number and the rest of that number), you can combine <break> and <spell> tags. These tags are considered 1 character and do not need to be separated with adjacent text using a space — to save credits you can remove spaces around break and spell tags.
My phone number is <spell>(123)</spell><break time="200ms"/><spell>4712177</spell> and my credit card number is <spell>1234</spell><break time="200ms"/><spell>5678</spell> <break time="200ms"/><spell>6347</spell><break time="200ms"/><spell>4537</spell>.