All models in the Sonic TTS family support custom pronunciations in your transcripts. Try out the pronunciation tool on our demo page.

Sonic-3
Sonic-turbo and Sonic-2

sonic-3 supports custom pronunciation dictionaries, which allow specifying how to pronounce a specific word or words more easily and sustainabily.At its core, a dictionary is a simple search and replace, which directs the model to use another string in lieu of the text for the transcript. The pronunciation can either be an IPA pronunciation, or a “sounds-like” guidance:

[
  {
    "text": "bayou",
    "pronunciation": "<<ˈ|b|ɑ|ˈ|j|u>>"
  },
  {
    "text": "jambalaya",
    "pronunciation": "<<ˈ|dʒ|ə|m|ˈ|b|ə|ˈ|l|aɪ|ˈ|ə>>"
  },
  {
    "text": "tchoupitoulas",
    "pronunciation": "chop-uh-TOO-liss"
  }
]

These JSONs can then be saved as a pronunciation dictionaries through our API, or through our playground. The playground gives affordances for creating and manipulating dictionaries also directly in the UI:

Once the dictionaries are created, they can be used in any of the TTS APIs by specifying the id in pronunciation_dict_id.With the above dictionary, the string: I ate some jambalaya on tchoupitoulas street would become I ate some <<ˈ|dʒ|ə|m|ˈ|b|ə|ˈ|l|aɪ|ˈ|ə>> on chop-uh-TOO-liss street before being handed off to the model, which in turn, would do a better job in pronouncing it properly.

For the best controllability around pronunciation, we recommend using sonic-3.

sonic-2 and sonic-turbo use MFA-style IPA for all languages.You can also get custom pronunciations with older Sonic models. The sonic, sonic-2024-12-12, and sonic-2024-10-19 models use Sonic-flavored IPA phonemes for English. The sonic and sonic-2024-12-12 use MFA-style IPA for languages other than English, and the Sonic Preview model uses MFA-style IPA for all languages. Note that sonic-2024-10-19 does not support custom pronunciations for languages other than English. We will soon be updating all models to use MFA-style IPA.Custom words should be wrapped in double angle brackets << >> , with pipe characters | between phonemes and no whitespace. For example:

Can I get <<x|a|l|a|p|e|ɲ|o>> on that? (MFA-style IPA)
Can I get <<h|ɑː|l|ˈə|p|eɪ|n|y|ˌoʊ|>> on that? (Sonic-flavored IPA)

Each individual word should be wrapped in it’s own set of angle brackets.

MFA-style IPA

Constructing Pronunciations

We use the IPA phoneset as defined by the Montreal Forced Aligner. Because of the size and complexity of this phoneset, you may find it easier to construct your custom pronunciation starting from existing words with known phonemizations. We suggest the following workflow for constructing a custom pronunciation for a word:

Go to the MFA pronunciation dictionary index and find the page corresponding to your language. Make sure the phoneset is MFA, and try to download the latest version (for most languages, v3.0 or v3.1).
1. This page will give you the full range of acceptable phones for your language under the “phones” section.
Scroll down to the Installation section and click on the Download from the release page link.
Scroll to the bottom of the release page and download the .dict file; this is a text file mapping words to their constituent phonemes.
1. The first column in the file contains words, and the last column contains space delimited phonemes. Ignore the other columns.
Look up your word or words that sound similar to your intended pronunciation in the dictionary. Use these pronunciations as a starting point to construct your custom pronunciation.

Automatic pronunciation suggestions based on audio samples will be added in a future update. Note that MFA-style IPA does not support stress markers.

Example

Suppose I want to generate the text “This is a generation from Cartesia” and the model is not pronouncing “Cartesia” correctly. I would do the following:

Go to the MFA pronunciation dictionary index and look for English pronunciation dictionaries. I see that for US English, the most recent version is v3.1.
1. I note that the page says that the acceptable phones for US english are aj aw b bʲ c cʰ cʷ d dʒ dʲ d̪ ej f fʲ h i iː j k kʰ kʷ l m mʲ m̩ n n̩ ow p pʰ pʲ pʷ s t tʃ tʰ tʲ tʷ t̪ v vʲ w z æ ç ð ŋ ɐ ɑ ɑː ɒ ɒː ɔj ə ɚ ɛ ɝ ɟ ɟʷ ɡ ɡʷ ɪ ɫ ɫ̩ ɱ ɲ ɹ ɾ ɾʲ ɾ̃ ʃ ʉ ʉː ʊ ʎ ʒ ʔ θ
Download the .dict file from the bottom of the release page.
Find a word in this dictionary that sounds similar to how I want “Cartesia” to be pronounced. I see this entry in the dictionary: cartesian 0.99 0.14 1.0 1.0 kʰ ɑ ɹ tʲ i ʒ ə n
Ignore the middle four numeric columns. I want to cut off the part of the pronunciation that corresponds to “-an” and replace it with an “uh” sound. I know that the MFA phoneme for “uh” is ɐ (if I didn’t know that, I could also look up “uh” in the dictionary). So the pronunciation I want is kʰ ɑ ɹ tʲ i ʒ ɐ.
Format the phonemes it in angle brackets with pipe characters between phonemes and no whitespace. So my transcript is This is a generation from <<kʰ|ɑ|ɹ|tʲ|i|ʒ|ɐ>>.

(Deprecated) Sonic-flavored IPA

Sonic-flavored IPA is only for sonic and users of our latest models (sonic-2 and sonic-turbo) should use MFA-style IPA.Here is a pronunciation guide for Sonic-flavored IPA. It follows the English phonology article on Wikipedia for most phonemes, but in spots where our model requires different notation than you may expect, we’ve included a blue <= in the margins.You can copy/paste some of these uncommon symbols from the original charts here.

Stresses and vowel length markers

Sonic English requires stress markers for first (ˈ) and second (ˌ) stressed syllables, which go directly before the vowel. We also use annotations for vowel length (ː). The model can also operate without them, but you will have noticeably better robustness and control when using them.

​MFA-style IPA

​Constructing Pronunciations

​Example

​(Deprecated) Sonic-flavored IPA

​Stresses and vowel length markers

MFA-style IPA

Constructing Pronunciations

Example

(Deprecated) Sonic-flavored IPA

Stresses and vowel length markers