Contexts
This is a hands-on guide to input streaming using WebSocket contexts. For a conceptual overview of how input streaming works in Sonic, see the input streaming guide.
In many real time use cases, you don’t have your transcripts available upfront—like when you’re generating them using an LLM. For these cases, Sonic supports input streaming.
The context IDs you pass to the Cartesia API identify speech contexts. Contexts maintain prosody between their inputs—so you can send a transcript in multiple parts and receive seamless speech in return.
To stream in inputs on a context, just pass a continue
flag (set to true
) for every input that you expect will be followed by more inputs. (By default, this flag is set to false
.)
To finish a context, just set continue
to false
. If you do not know the last transcript in advance, you can send an input with an empty transcript and continue
set to false
.
Whether this input may be followed by more inputs.
Input Format
- Inputs on the same context must keep all fields except
transcript
,continue
, andduration
the same. - Transcripts are concatenated verbatim, so make sure they form a valid transcript when joined together. Make sure to include any spaces between words or punctuations as necessary. For example, in languages with spaces, you should include a space at the end of the preceding transcript, e.g. transcript 1 is
Thanks for coming,
and transcript 2 isit was great to see you.
- It’s important to buffer the first request transcript to at least 3 or 4 words for best performance.
Example
Let’s say you’re trying to generate speech for “Hello, Sonic! I’m streaming inputs.” You should stream in the following inputs (repeated fields omitted for brevity). Note: all other fields (e.g. model_id
, language
) are required and should be passed unchanged between requests with input streaming.
If you don’t know the last transcript in advance, you can send an input with an empty transcript and continue
set to false
:
Output
You will only receive done: true
after outputs for the entire context have been returned.
Outputs for a given context will always be in order of the inputs you streamed in. (That is, if you send input A and then input B on a context, you will first receive the chunks corresponding to input A, and then the chunks corresponding to input B.)
Cancelling Requests
You may also cancel outgoing requests through the websocket.
To cancel a request, send a JSON message with the following structure:
When you send a cancel request:
- It will only halt requests that have not begun generating a response yet.
- Any currently generating request will continue sending responses until completion.
The context_id
in the cancel request should match the context_id
of the request you want to cancel.