Finalize command mid-session.
Back to guides
Other ways to migrate and best practices for Cartesia Speech-to-Text
If you’re already using the Cartesia SDK, upgrade to version
>=3.2.0Ink 2 only supports English right now.
We expect to add more languages in the coming months.
We expect to add more languages in the coming months.
Connection
Replace the Deepgram WebSocket URL and auth header with Cartesia’s/stt/turns/websocket.
cartesia_version query param and use a short-lived access token using the access_token query param instead of an API key.
Connect to the auto-finalization WebSocket with the Cartesia SDK:
Query parameters
| Deepgram Nova | Cartesia Realtime STT (Auto) | Notes |
|---|---|---|
model=nova-3 required | model=ink-2 required | See Models for all options. |
version=latest | — | Model version is controllable via the model param. |
encoding=linear16 | encoding=pcm_s16le required | See encoding for all options. |
sample_rate | sample_rate required | No change. |
language | — | ink-2 only supports en right now. More languages are coming soon! |
| — | cartesia_version=2026-03-01 required | See API Conventions for details. |
channels, multichannel | — | Send a mono audio stream per WebSocket connection. |
endpointing, interim_results, utterance_end_ms, vad_events | — | Not required. |
diarize | — | Coming soon! |
keyterm, keywords | — | Coming soon! |
mip_opt_out | — | Controlled by your organization. |
encoding
encoding
| Deepgram | Cartesia |
|---|---|
linear16 | pcm_s16le |
linear32 | pcm_s32le |
mulaw | pcm_mulaw |
alaw | pcm_alaw |
| Not supported | pcm_f16le |
| Not supported | pcm_f32le |
flac | Not supported |
amr-nb | Not supported |
amr-wb | Not supported |
opus | Not supported |
ogg-opus | Not supported |
speex | Not supported |
g729 | Not supported |
Sending audio
Both APIs accept raw PCM audio as binary WebSocket frames in the same way.Cartesia does not support these encodings:Deepgram’sflac,amr-nb,amr-wb,opus,ogg-opus,speex,g729
Finalize command has no equivalent. Ink detects turn boundaries on its own and emits a turn.end when the user stops speaking, so there is nothing to flush.
KeepAlive message. The connection has a 3-minute idle timeout that resets every time you send an audio chunk — keep streaming audio (silent or otherwise) to hold it open.
Event mapping
Deepgram emits four server message types, mixing transcript results with separate voice-activity signals. Cartesia folds the same information into a turn lifecycle:turn.start, turn.update, turn.eager_end, turn.resume, and turn.end. See Turn Detection for the full state machine.
Deepgram type | Cartesia type | Notes |
|---|---|---|
SpeechStarted | turn.start | The user began speaking. Carries no transcript. |
Results (is_final: false) | turn.update | Interim transcript for the utterance / turn. |
Results (is_final: true) | turn.end | Final transcript for the utterance / turn. |
UtteranceEnd | turn.end | The user stopped speaking. |
| — | turn.eager_end | The model predicts the user might be done speaking. Okay to ignore. |
| — | turn.resume | The user kept talking; ignore the last turn.eager_end. |
Metadata | — | No equivalent. |
| — | connected | Fires once when the WebSocket is established. You do not need to wait for it before sending audio. |
| — | error | Client or server errors. |
The Deepgram Results message
Deepgram’s Results carry a lot of information. You can extract similar information from Cartesia’s turn.update and turn.end events.
Deepgram Results | Cartesia | Notes |
|---|---|---|
is_final: false | turn.update | Interim transcript for the utterance / turn. Cumulative since the last final transcript |
is_final: true | turn.end | Final transcript for the utterance / turn. |
speech_final: true | turn.end | The user stopped speaking. |
Deepgram sends
UtteranceEnd and speech_final: true separately, but they have the same semantic meaning: “the user has finished speaking”.Cartesia simplifies this into a single high-accuracy signal: turn.end.Results message:
turn.update / turn.end event:
turn.start and turn.resume events do not carry a transcript.Example Server Messages
Hello! Nova’s transcripts are joined with spaces. Ink’s are not.
| Deepgram Nova | Cartesia Realtime STT (Auto) |
|---|---|
| SpeechStarted | turn.start |
is_final: false "Hello!" | turn.update "Hello!" |
| — | turn.eager_end "Hello!" |
| — | turn.resume |
is_final: false "Hello! Nova's transcripts are joined with spaces." | turn.update "Hello! Nova's transcripts are joined with spaces." |
| — | turn.eager_end "Hello! Nova's transcripts are joined with spaces." |
is_final: true "Hello! Nova's transcripts are joined with spaces." | turn.end "Hello! Nova's transcripts are joined with spaces." |
| UtteranceEnd | — |
| SpeechStarted | turn.start |
is_final: false "Ink's are not." | turn.update " Ink's are not." |
| — | turn.eager_end " Ink's are not." |
is_final: true "Ink's are not." | turn.end " Ink's are not." |
| UtteranceEnd | — |
References
API Reference
Cartesia Realtime STT (Auto)
Full Code Example
Using the Cartesia SDK