Buffering

Cartesia supports two buffering modes for streaming TTS: managed buffering and custom buffering. The right choice depends on how much control you need over the prosody-latency tradeoff.

Start with managed buffering. It produces natural-sounding speech with minimal integration effort. Switch to custom buffering only if you need fine-grained control.

Managed buffering

Stream LLM tokens directly to Cartesia and let the API decide when to start generating speech. This is the same approach used in Cartesia’s managed voice agents platform. Set max_buffer_delay_ms to a value greater than 0 (the default is 3000ms) and stream text token by token.

{
  "model_id": "sonic-3.5",
  "transcript": "Hello",
  "voice": {
    "mode": "id",
    "id": "a0e99841-438c-4a64-b679-ae501e7d6091"
  },
  "context_id": "my-context",
  "continue": true,
  "max_buffer_delay_ms": 3000
}

The API buffers incoming text until it has enough context to produce high-quality speech, or until max_buffer_delay_ms elapses—whichever comes first. This produces results similar to sentence-level aggregation while still optimizing for latency. When to use managed buffering:

You’re streaming LLM output token by token
You want natural-sounding speech without building buffering logic
You want a simple integration with good defaults

Custom buffering

Handle buffering yourself and send complete phrases or sentences to Cartesia. Set max_buffer_delay_ms to 0 so the API generates speech immediately from whatever you provide.

{
  "model_id": "sonic-3.5",
  "transcript": "Hello, my name is Sonic.",
  "voice": {
    "mode": "id",
    "id": "a0e99841-438c-4a64-b679-ae501e7d6091"
  },
  "context_id": "my-context",
  "continue": true,
  "max_buffer_delay_ms": 0
}

With custom buffering, you control the prosody-latency tradeoff directly:

Full sentences produce the best prosody but add latency while you wait for the sentence to complete.
Partial sentences reduce latency but may result in less natural speech at chunk boundaries.

When to use custom buffering:

You need precise control over when speech generation starts
You have your own sentence detection or text aggregation logic
You’re optimizing for a specific latency target

Avoid the middle ground

A common mistake is to aggregate text client-side into sentences or phrases and use the default max_buffer_delay_ms of 3000ms. This can cause unnecessary latency—after receiving a complete sentence, the API may wait up to 3000ms for additional input before generating speech. Pick one approach:

Managed buffering: Stream tokens with max_buffer_delay_ms > 0 and let Cartesia handle aggregation.
Custom buffering: Aggregate text yourself and set max_buffer_delay_ms = 0.

Configuration reference

max_buffer_delay_ms

number

default:"3000"

Maximum time in milliseconds the API waits for additional input before generating speech from buffered text.

Range: 0–5000ms
Default: 3000ms
Set to 0 for custom buffering (no server-side buffering)
Set to > 0 for managed buffering

If you use speed or volume SSML tags with managed buffering, make sure decimal values are not split across tokens. Submitting 1.0 as 1, ., 0 will cause parsing errors.

Tips for best results

End sentences with punctuation. Without closing punctuation (., ?, !), the model may treat text as incomplete and wait for the buffer delay to elapse before generating. See streaming inputs with continuations for more details.
Signal when input is done. When a turn is complete, use continue: false (WebSocket) or no_more_inputs() (SDK) so the model doesn’t wait for more text.
Test with realistic input patterns. Buffering behavior depends on how text arrives—test with actual LLM output rather than pre-written text.

Use the API

API Status

TTS

STT

Agents

Voices

Voice Changer

Auth

Datasets

Fine Tunes

Infill

Pronunciation Dicts

Admin

Managed buffering

Custom buffering

Avoid the middle ground

Configuration reference

Tips for best results

Use the API

API Status

TTS

STT

Agents

Voices

Voice Changer

Auth

Datasets

Fine Tunes

Infill

Pronunciation Dicts

Admin

Documentation Index

​Managed buffering

​Custom buffering

​Avoid the middle ground

​Configuration reference

​Tips for best results

Managed buffering

Custom buffering

Avoid the middle ground

Configuration reference

Tips for best results