Skip to main content
Pronunciation dictionaries let you specify how to pronounce specific words and phrases. Try out the pronunciation tool on our demo page. A dictionary is a simple search and replace, which directs the model to use another string in lieu of the text from the transcript. The pronunciation can be either an IPA pronunciation or a “sounds-like” guidance:
[
  {
    "text": "bayou",
    "pronunciation": "<<ˈ|b|ɑ|ˈ|j|u>>"
  },
  {
    "text": "jambalaya",
    "pronunciation": "<<ˈ|dʒ|ə|m|ˈ|b|ə|ˈ|l|aɪ|ˈ|ə>>"
  },
  {
    "text": "tchoupitoulas",
    "pronunciation": "chop-uh-TOO-liss"
  }
]
Save these JSONs as pronunciation dictionaries through our API or through our playground: image.png Once a dictionary is created, use it in any TTS API by passing its id as pronunciation_dict_id. With the dictionary above, the string I ate some jambalaya on tchoupitoulas street becomes I ate some <<ˈ|dʒ|ə|m|ˈ|b|ə|ˈ|l|aɪ|ˈ|ə>> on chop-uh-TOO-liss street before being handed off to the model.

Case Sensitivity

Dictionary matching is case-sensitive, with one exception: a lowercase entry also matches its sentence-start capitalized form. For example, cat matches both cat and Cat, but not CAT. An entry for CAT only matches CAT. This applies to multi-word entries too. An entry for green valley matches green valley and Green valley, but not Green Valley. Use lowercase entries for common words. These match the word both mid-sentence (cat) and at the start of a sentence (Cat), covering the two most common positions. Use exact capitalization for proper nouns. A term like LaTeX should be entered as LaTeX so it doesn’t collide with a different pronunciation for the common word latex. For multi-word proper nouns, enter the exact casing as it appears in your transcripts — for example, Green Valley if the transcript capitalizes both words.