Welcome to Cartesia

Our API enables developers to build real-time, multimodal AI experiences that feel natural and responsive.

The Cartesia API currently serves our state-of-the-art multilingual generative voice model, Sonic 2, and its Turbo variant, Sonic Turbo. (Earlier models from the Sonic family are available as well.)

Sonic models take text input and and stream back ultra-realistic speech in response. They can also clone voices, with full control over pronunciation and accent.

Sonic 2 is the world’s fastest ultra-realistic text-to-speech model. It can stream out the first byte of audio in just 90ms, making it perfect for real-time and conversational experiences as well as dubbing, narration, AI avatars, and more. (To put things into perspective, 90ms is about twice as fast as the blink of an eye.)

If real-time performance is your top priority, Sonic Turbo offers even better performance, streaming out the first byte of audio in just 40ms.

Learn more about available Sonic model variants and their capabilities in the Models section.

Support