Prerequisites
-
A Cartesia API key. Create one here, then add it to your
.bashrcor.zshrc: -
ffplay(part of FFmpeg), used to play audio output:- macOS
- Ubuntu
Stream text and play audio
- Python
- JavaScript
How it works
The WebSocket connection manages multiple contexts, each representing an independent, continuous stream of speech. Cartesia context is exactly like an LLM context: on our servers, we store the previously-generated speech so that new speech matches it in tone. To summarize, here’s what our code does, after establishing a Websocket connection:- Create a context with
context(). - Push text incrementally with
push(). Each chunk continues seamlessly from the previous one using continuations. - Signal completion with
no_more_inputs()to tell the model no more text is coming. - Receive audio chunks as they are generated.
What’s next
Stream inputs using continuations
Deep dive into context management and buffering.
Choose a Voice
Browse voices and learn how to pick the right one for your use case.
Choosing TTS parameters
Pick the right output format, sample rate, and encoding for your use case.