Prerequisites
-
A Cartesia API key. Create one here, then add it to your
.bashrcor.zshrc:When accessing the Cartesia API from a browser, please use ephemeral access tokens for authentication to keep your API Key safe. See Authenticate Your Client Applications. -
ffplay(part of FFmpeg), used to play audio output: Download the FFmpeg executable package for your operating system from the FFmpeg download page. - A language runtime and package manager:
Stream text and play audio
- Python
- TypeScript
Install the client library
How it works
The WebSocket connection can manage multiple contexts where each context is a full-duplex, continuous stream. You push text chunks in and receive generated audio chunks out in real time. This works well when generating text from an LLM in real time: Cartesia’s TTS system maintains context history and appends each new chunk to it. This keeps generated speech continuous and consistent in tone and prosody while minimizing latency since you don’t have to wait for the full transcript to be ready. To summarize, here’s what our code does after establishing a WebSocket connection:- Create a context with
context(). - Push text incrementally with
push(). Each call sends the chunk withcontinue: true, telling the model more text will follow. See continuations for details. - Signal completion with
no_more_inputs(), which sendscontinue: falseto tell the model no more text is coming. - Receive audio chunks as they are generated.
What’s next
Pick a voice in Playground
Choose a voice, or clone your own, then copy the voice ID back into this quickstart.
Tune WebSocket request params
Change
voice, model_id, or output_format, then rerun and compare output quality and behavior.Stream inputs using continuations
Send incremental text while preserving flow across chunks for smoother long-form or LLM-driven speech.