Skip to main content
A voice can support one or more locales, defined by language and accent. A voice may sound less natural when used in a locale it does not support. For example, an American English voice used to generate Spanish will speak with an American accent.

Localization

Localization adapts a voice to sound natural in a target locale while preserving the original speaker’s identity and character. You can localize a voice to a different language, such as adapting an English voice to Spanish, or to a different accent, such as adapting an American English voice to British English. Localization is available in the playground and the API. Provide the source voice, its gender, and the target language and accent. See the API reference for the full list of supported locales. Localization currently creates a new voice with its own voice_id. We’re working to let you add more locales to an existing voice instead.

Voices that support multiple locales

Some featured voices support multiple locales through a single voice_id. Set the language field to the locale you want the voice to speak. If you omit it, Cartesia will attempt to detect the language from the transcript. For more reliable language detection, avoid very short transcripts.

Code-switching

A voice may support multiple languages across separate generations. Switching languages within a single generation — for example, mixing English and Spanish in one sentence — is called code-switching and is not generally supported. Code-switching may work in limited cases, such as Hindi and Tagalog, where it commonly appears in model training data.