commit_strategy=manual.
Back to guides
Other ways to migrate and best practices for Cartesia Speech-to-Text
If you’re already using the Cartesia SDK, upgrade to version
>=3.2.0Connection
Replace the ElevenLabs WebSocket URL and auth header with Cartesia’s/stt/websocket.
cartesia_version query param and use a short-lived access token using the access_token query param instead of an API key.
Connect to the manual-finalization WebSocket with the Cartesia SDK:
Query parameters
| ElevenLabs Scribe (manual) | Cartesia Realtime STT (Manual) | Notes |
|---|---|---|
model_id=scribe_v2_realtime required | model=ink-2 required | See Models for all options. |
audio_format=pcm_16000 | encoding=pcm_s16le + sample_rate=16000 required | ElevenLabs bundles format and rate; Cartesia splits them. See encoding. |
commit_strategy=manual | — | See auto finalization for automatic commits. |
language_code | language | ink-2 only supports en right now. Use ink-whisper for other languages. |
| — | cartesia_version=2026-03-01 required | See API Conventions for details. |
include_timestamps | — | Always included if supported by the model (ink-whisper only for now). |
keyterms | — | Coming soon! |
enable_logging | — | Controlled by your organization. |
encoding
encoding
ElevenLabs bundles the sample format and rate into a single
Cartesia also accepts
All Cartesia encodings support all sample rates.
audio_format token. Cartesia splits them into encoding and sample_rate.ElevenLabs audio_format | Cartesia encoding | Cartesia sample_rate |
|---|---|---|
pcm_8000 | pcm_s16le | 8000 |
pcm_16000 | pcm_s16le | 16000 |
pcm_22050 | pcm_s16le | 22050 |
pcm_24000 | pcm_s16le | 24000 |
pcm_44100 | pcm_s16le | 44100 |
pcm_48000 | pcm_s16le | 48000 |
ulaw_8000 | pcm_mulaw | 8000 |
pcm_s32le, pcm_f16le, pcm_f32le, and pcm_alaw.All Cartesia encodings support all sample rates.
Sending audio
ElevenLabs wraps each audio chunk in a JSON formatted text frame and base64-encodes the audio bytes.Cartesia accepts audio chunks as binary frames: send the raw audio bytes directly:
Cartesia’s control commands are bare text frames, not JSON. To commit buffered audio and emit a transcript without ending the session, send a
- No need to supply previous text
- Sample rate is determined upon connection by the
sample_ratequery parameter
finalize frame:
| ElevenLabs | Cartesia |
|---|---|
Streams partial_transcript events, then committed_transcript once you commit. | Streams final transcript deltas continuously as audio arrives. finalize flushes text that might otherwise be held back for a few seconds. |
close frame:
Sending audio with the SDK
Decoding base64 encoded audio before sending
Committing and closing
Event mapping
Scribe emits interimpartial_transcript events, then a committed_transcript when you commit.Cartesia emits
transcript deltas plus acknowledgments for the finalize and close commands.
ElevenLabs message_type | Cartesia type | Notes |
|---|---|---|
partial_transcript | transcript (is_final: false) | Never sent by Ink 2 or Whisper (reserved for future models). |
committed_transcript | transcript (is_final: true) + flush_done | ElevenLabs sends committed transcripts in one message; Cartesia sends transcript messages containing deltas, then a flush_done message once all deltas have been sent for the segment |
committed_transcript_with_timestamps | transcript (is_final: true) + flush_done | Only ink-whisper supports timestamps right now. |
| — | done | Sent after all audio until close has been transcribed, immediately before the WebSocket closes. |
error | error | Client or server errors. |
auth_error | — | Cartesia will reject the WebSocket upgrade with a 401 or 403 HTTP status. |
quota_exceeded | error | Cartesia’s error response will contain "error_code": "quota_exceeded". |
rate_limited | error | Cartesia’s error response will contain "error_code": "concurrency_limited". |
session_time_limit_exceeded | — | Cartesia will send a WebSocket close frame with code 1001. |
Committed transcripts
A Scribecommitted_transcript carries the full text of the segment since the last commit:
transcript events, each carrying a delta:
Followed by a Cartesia
- Ink 2 does not return
durationorwordsyet- Ink 2 and Whisper currently only emit final transcripts (
is_final: true)
flush_done event:
Ignore theis_finalproperty onflush_doneanddoneevents
Example Server Messages
Scribe sends full transcripts. Ink sends deltas and may break words.
| ElevenLabs Scribe (manual) | Cartesia Realtime STT (Manual) |
|---|---|
partial_transcript "Scribe sends" | is_final: true "Scribe sends" |
partial_transcript "Scribe sends full transcripts." | is_final: true " full transc" |
| commit (client) | finalize (client) |
committed_transcript "Scribe sends full transcripts." | is_final: true "ripts." |
committed_transcript_with_timestamps "Scribe sends full transcripts." | flush_done |
partial_transcript "Ink sends deltas" | is_final: true " Ink sends" |
partial_transcript "Ink sends deltas and may break words." | is_final: true " deltas and may break wor" |
| commit (client) | finalize (client) |
committed_transcript "Ink sends deltas and may break words." | is_final: true "ds." |
committed_transcript_with_timestamps "Ink sends deltas and may break words." | flush_done |
References
API Reference
Cartesia Realtime STT (Manual)
Full Code Example
Using the Cartesia SDK