Models
Cartesia provides a family of state-of-the-art models, including our highly-accurate, low-latency Sonic TTS model family.
- ● the latest stable snapshot of the model
To use the stable version of the model, we recommend using the base model name (e.g. sonic-2
).
In many cases the stable and preview snapshots are the same, but in some cases the preview snapshot may have additional features or improvements.
sonic-2
Sonic-2 is the new and improved version of Sonic, with a whole new set of capabilities around generative speech. Like its predecessor, it produces high-accuracy, expressive speech, and is optimized for efficiency to achieve low latency.
Capabilities:
- Higher fidelity voice cloning
- Timestamps for all 15 languages
- Infill support
To learn how to use the Sonic TTS family, see Make an API request.
sonic-turbo
All the power of Sonic, with half the latency (as low as 40ms).
sonic
The first version of our flagship text-to-speech model. It produces high-accuracy, expressive speech, and is optimized for efficiency to achieve low latency.
Selecting a Model
When making API calls, you can specify either:
Continuous updates
All models have a base model name (e.g. sonic-2
, sonic-turbo
, sonic
).
We recommend using these for prototyping and development, then switching to a date-versioned model for production use cases to ensure stability.
Language Support
- English (
en
) - French (
fr
) - German (
de
) - Spanish (
es
) - Portuguese (
pt
) - Chinese (
zh
) - Japanese (
ja
) - Hindi (
hi
) - Italian (
it
) - Korean (
ko
) - Dutch (
nl
) - Polish (
pl
) - Russian (
ru
) - Swedish (
sv
) - Turkish (
tr
)
Future Updates
New snapshots are released periodically with improvements in performance, additional language support, and new capabilities. Check back regularly for updates.