If you’re just looking to play with our models, check out the playground!

Welcome to the Cartesia documentation! Cartesia’s mission is to build real time multimodal intelligence for every device. Our current offering includes an API that serves our flagship generative voice model, Sonic.

Sonic is the fastest text-to-speech model around—it can generate a second of audio in 595ms, and it can stream out the first audio chunk in just 95ms. It’s capable of instant voice cloning and voice design. Alongside Sonic, we also offer an extensive prebuilt voice library for a variety of use cases.

In this documentation, you’ll read about our model, find out how to use our API, and pick up tips and tricks for optimizing performance and quality.