Formatting Text for Sonic-2

The tips below are specific to current our Sonic model family, and may change in the future as we improve our modeling and training procedure.

Use appropriate punctuation. Add punctuation where appropriate and at the end of each transcript whenever possible.
Use dates in MM/DD/YYYY form. For example, 04/20/2023.
Insert pauses. To insert pauses, insert ”-” or use break tags where you need the pause. These tags are considered 1 character and do not need to be separated with adjacent text using a space — to save credits you can remove spaces around break tags.
Match the voice to the language. Each voice has a language that it works best with. You can use the playground to quickly understand which voices are most appropriate for a language.
Stream in inputs for contiguous audio. Use continuations if generating audio that should sound contiguous in separate chunks.
Specify custom pronunciations for domain-specific or ambiguous words. You may want to do this for proper nouns and trademarks, as well as for words that are spelled the same but pronounced differently, like the city of Nice and the adjective “nice.”
Use two question marks to emphasize questions. For example, “Are you here??” vs. “Are you here?”
Avoid using quotation marks. (Unless you intend to refer to a quote.)
Use a space between a URL or email and a question mark. Otherwise, the question mark will be read out. For example, write Did you send the email to support@cartesia.ai ? instead of Did you send the email to support@cartesia.ai?.
Use “dot” instead of ”.” in URLs. For example, write “cartesia dot ai” instead of “cartesia.ai” for better pronunciation.