Tips and tricks for better generation quality with Sonic

The tips below are specific to current models (sonic-english and sonic-multilingual), and may change in the future as we improve our modeling and training procedure.

  1. Use appropriate punctuation. Add punctuation where appropriate and at the end of each transcript whenever possible.
  2. Use dates in MM/DD/YYYY form. For example, 04/20/2023.
  3. Insert pauses. To insert pauses, insert “-” where you need the pause.
  4. Match the voice to the language. Each voice has a language that it works best with. You can use the playground to quickly understand which voices are most appropriate for a language.
  5. Stream in inputs for contiguous audio. Use continuations if generating audio that should sound contiguous in separate chunks.
  6. Specify custom pronunciations for domain-specific or ambiguous words. You may want to do this for proper nouns and trademarks, as well as for words that are spelled the same but pronounced differently, like the city of Nice and the adjective “nice.”
  7. Use two question marks to emphasize questions. For example, “Are you here??” vs. “Are you here?”
  8. Avoid using quotation marks. (Unless you intend to refer to a quote.)
  9. Use a space between a URL or email and a question mark. Otherwise, the question mark will be read out. For example, write Did you send the email to support@cartesia.ai ? instead of Did you send the email to support@cartesia.ai?.