> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Custom Pronunciations

> Specify custom pronunciations for words that are hard to get right, like proper nouns or domain-specific terms.

All models in the Sonic TTS family support custom pronunciations in your transcripts. Try out the pronunciation tool on our [demo](https://play.cartesia.ai/demos/pronunciation) page.

<Tabs>
  <Tab title="Sonic 3.5 and Sonic 3">
    `sonic-3.5` and `sonic-3` support custom pronunciation dictionaries, which let you specify how to pronounce a specific word or phrase.

    A dictionary is a simple search and replace, which directs the model to use another string in lieu of the text from the transcript. The pronunciation can be either an [IPA pronunciation](/build-with-cartesia/capability-guides/phonemes) or a "sounds-like" guidance:

    ```json lines theme={null}
    [
      {
        "text": "bayou",
        "pronunciation": "<<ˈ|b|ɑ|ˈ|j|u>>"
      },
      {
        "text": "jambalaya",
        "pronunciation": "<<ˈ|dʒ|ə|m|ˈ|b|ə|ˈ|l|aɪ|ˈ|ə>>"
      },
      {
        "text": "tchoupitoulas",
        "pronunciation": "chop-uh-TOO-liss"
      }
    ]
    ```

    <Note>
      The legacy `alias` field is deprecated. Use `pronunciation` for new dictionary items.
    </Note>

    Save these JSONs as pronunciation dictionaries [through our API](/api-reference/pronunciation-dicts/create) or through our [playground](https://play.cartesia.ai/pronunciation):

    <img src="https://mintcdn.com/cartesia-2650f86a/e1mnuMNt5eyESNEd/images/image.png?fit=max&auto=format&n=e1mnuMNt5eyESNEd&q=85&s=e5284d8275a53cb69e1fef0f32ccca4c" alt="image.png" width="1264" height="414" data-path="images/image.png" />

    Once a dictionary is created, use it in any TTS API by passing its id as `pronunciation_dict_id`.

    With the dictionary above, the string `I ate some jambalaya on tchoupitoulas street` becomes `I ate some <<ˈ|dʒ|ə|m|ˈ|b|ə|ˈ|l|aɪ|ˈ|ə>> on chop-uh-TOO-liss street` before being handed off to the model.

    ## Case Sensitivity

    Dictionary matching is **case-sensitive**, with one exception: a lowercase entry also matches its sentence-start capitalized form. For example, `cat` matches both `cat` and `Cat`, but not `CAT`. An entry for `CAT` only matches `CAT`.

    This applies to multi-word entries too. An entry for `green valley` matches `green valley` and `Green valley`, but not `Green Valley`.

    **Use lowercase entries for common words.** These match the word both mid-sentence (`cat`) and at the start of a sentence (`Cat`), covering the two most common positions.

    **Use exact capitalization for proper nouns.** A term like `LaTeX` should be entered as `LaTeX` so it doesn't collide with a different pronunciation for the common word `latex`. For multi-word proper nouns, enter the exact casing as it appears in your transcripts — for example, `Green Valley` if the transcript capitalizes both words.
  </Tab>

  <Tab title="Sonic-2 and Sonic-turbo">
    <Info>
      For the best controllability around pronunciation, we recommend using `sonic-3.5`.
    </Info>

    `sonic-2` and `sonic-turbo` use MFA-style IPA for all languages.
    For the best controllability around pronunciation, we recommend using `sonic-2`.

    You can also get custom pronunciations with older Sonic models.
    The `sonic`, `sonic-2024-12-12`, and `sonic-2024-10-19` models use Sonic-flavored IPA phonemes for English.
    The `sonic` and `sonic-2024-12-12` use MFA-style IPA for languages other than English, and the Sonic Preview model uses MFA-style IPA for all languages.
    Note that `sonic-2024-10-19` does not support custom pronunciations for languages other than English.
    We will soon be updating all models to use MFA-style IPA.

    Custom words should be wrapped in double angle brackets `<<` `>>` , with pipe characters `|` between phonemes and no whitespace.
    For example:

    * `Can I get <<x|a|l|a|p|e|ɲ|o>> on that?` (MFA-style IPA)
    * `Can I get <<h|ɑː|l|ˈə|p|eɪ|n|y|ˌoʊ|>> on that?` (Sonic-flavored IPA)

    Each individual word should be wrapped in it’s own set of angle brackets.

    # MFA-style IPA

    ## Constructing Pronunciations

    We use the IPA phoneset as defined by the [Montreal Forced Aligner](https://montreal-forced-aligner.readthedocs.io/en/latest/). Because of the size and complexity of this phoneset, you may find it easier to construct your custom pronunciation starting from existing words with known phonemizations. We suggest the following workflow for constructing a custom pronunciation for a word:

    1. Go to the [MFA pronunciation dictionary index](https://mfa-models.readthedocs.io/en/latest/dictionary/index.html) and find the page corresponding to your language. Make sure the phoneset is MFA, and try to download the latest version (for most languages, v3.0 or v3.1).
       1. This page will give you the full range of acceptable phones for your language under the “phones” section.
    2. Scroll down to the `Installation` section and click on the `Download from the release page` link.
    3. Scroll to the bottom of the release page and download the .dict file; this is a text file mapping words to their constituent phonemes.
       1. The first column in the file contains words, and the last column contains space delimited phonemes. Ignore the other columns.
    4. Look up your word or words that sound similar to your intended pronunciation in the dictionary. Use these pronunciations as a starting point to construct your custom pronunciation.

    Automatic pronunciation suggestions based on audio samples will be added in a future update. Note that MFA-style IPA does not support stress markers.

    ## Example

    Suppose I want to generate the text “This is a generation from Cartesia” and the model is not pronouncing “Cartesia” correctly. I would do the following:

    1. Go to the [MFA pronunciation dictionary index](https://mfa-models.readthedocs.io/en/latest/dictionary/index.html) and look for English pronunciation dictionaries. I see that for US English, the most recent version is v3.1.
       1. I note that the page says that the acceptable phones for US english are `aj aw b bʲ c cʰ cʷ d dʒ dʲ d̪ ej f fʲ h i iː j k kʰ kʷ l m mʲ m̩ n n̩ ow p pʰ pʲ pʷ s t tʃ tʰ tʲ tʷ t̪ v vʲ w z æ ç ð ŋ ɐ ɑ ɑː ɒ ɒː ɔj ə ɚ ɛ ɝ ɟ ɟʷ ɡ ɡʷ ɪ ɫ ɫ̩ ɱ ɲ ɹ ɾ ɾʲ ɾ̃ ʃ ʉ ʉː ʊ ʎ ʒ ʔ θ`

    2. Download the .dict file from the bottom of the [release page](https://github.com/MontrealCorpusTools/mfa-models/releases/tag/dictionary-english_us_mfa-v3.1.0).

    3. Find a word in this dictionary that sounds similar to how I want “Cartesia” to be pronounced. I see this entry in the dictionary:

       `cartesian	0.99	0.14	1.0	1.0	kʰ ɑ ɹ tʲ i ʒ ə n`

    4. Ignore the middle four numeric columns. I want to cut off the part of the pronunciation that corresponds to “-an” and replace it with an “uh” sound. I know that the MFA phoneme for “uh” is `ɐ` (if I didn’t know that, I could also look up “uh” in the dictionary). So the pronunciation I want is `kʰ ɑ ɹ tʲ i ʒ ɐ`.

    5. Format the phonemes it in angle brackets with pipe characters between phonemes and no whitespace. So my transcript is `This is a generation from <<kʰ|ɑ|ɹ|tʲ|i|ʒ|ɐ>>`.

    # (Deprecated) Sonic-flavored IPA

    Sonic-flavored IPA is only for `sonic` and users of our latest models (`sonic-2` and `sonic-turbo`) should use MFA-style IPA.

    Here is a pronunciation guide for Sonic-flavored IPA.
    It follows the [English phonology article on Wikipedia](https://en.wikipedia.org/wiki/English_phonology) for most phonemes,
    but in spots where our model requires different notation than you may expect, we've included a blue `<=` in the margins.

    You can copy/paste some of these uncommon symbols from the original [charts here](https://docs.google.com/spreadsheets/d/1OJbiKtxLyodpNPqVfOu43X2HloLsAixTtFppEuQ_4pI/edit?usp=sharing).

    <Frame>
      <img src="https://mintcdn.com/cartesia-2650f86a/GOsvXpql8JfAlgjy/assets/images/sonic_ipa_guide.png?fit=max&auto=format&n=GOsvXpql8JfAlgjy&q=85&s=73894e30d68160ffb033b49a2df4fd2d" alt="" width="960" height="540" data-path="assets/images/sonic_ipa_guide.png" />
    </Frame>

    ## Stresses and vowel length markers

    Sonic English requires stress markers for first (`ˈ`) and second (`ˌ`) stressed syllables, which go directly before the vowel. We also use annotations for vowel length (`ː`). The model can also operate without them, but you will have noticeably better robustness and control when using them.
  </Tab>
</Tabs>
