Welcome to Cartesia - Cartesia Docs

Sonic Models for Text-to-Speech
Ink Models for Speech-to-Text
Support

The Cartesia API is the fastest, most emotive, ultra-realistic voice AI platform. Purpose-built for developers, it serves state-of-the-art models for both text-to-speech and speech-to-text, enabling seamless conversational AI experiences.

Sonic Models for Text-to-Speech

Sonic models take text input and and stream back ultra-realistic speech in response. They can also clone voices, with full control over pronunciation and accent. Sonic 3 is the world’s fastest, most emotive, ultra-realistic text-to-speech model. It can stream out the first byte of audio in just 90ms, making it perfect for real-time and conversational experiences as well as dubbing, narration, AI avatars, and more. (To put things into perspective, 90ms is about twice as fast as the blink of an eye.) If real-time performance is your top priority, Sonic Turbo offers even better performance, streaming out the first byte of audio in just 40ms. Learn more about available Sonic model variants and their capabilities in the TTS Models section.

Ink Models for Speech-to-Text

Ink models provide streaming speech-to-text transcription optimized for real-time voice applications. Ink-Whisper, our debut model, is specifically engineered for conversational AI—handling telephony artifacts, background noise, accents, and proper nouns that typically challenge standard STT systems. Ink-Whisper uses advanced dynamic chunking to process variable-length audio segments, reducing errors and hallucinations during pauses or audio gaps. At just $0.13/hour, it’s the most affordable streaming STT model available. Learn more about the Ink model and its capabilities in the STT Models section.

Support

Discord

Join our Discord server to chat with the Cartesia team, engage with the community, and get help with your projects.

Email

Email us at support@cartesia.ai to get help with integrating Cartesia, your account, or billing.

⌘I

​Sonic Models for Text-to-Speech

​Ink Models for Speech-to-Text

​Support

Discord

Email

Sonic Models for Text-to-Speech

Ink Models for Speech-to-Text

Support