Finalize command mid-session.
Back to guides
Other ways to migrate and best practices for Cartesia Speech-to-Text
If you’re already using the Cartesia SDK, upgrade to version
>=3.2.0Connection
Replace the Deepgram WebSocket URL and auth header with Cartesia’s/stt/websocket.
cartesia_version query param and use a short-lived access token using the access_token query param instead of an API key.
Connect to the manual-finalization WebSocket with the Cartesia SDK:
Query parameters
| Deepgram Nova | Cartesia Realtime STT (Manual) | Notes |
|---|---|---|
model=nova-3 required | model=ink-2 required | See Models for all options. |
version=latest | — | Model version is controllable via the model param. |
encoding=linear16 | encoding=pcm_s16le required | See encoding for all options. |
sample_rate | sample_rate required | No change. |
language | language | ink-2 only supports en right now. Use ink-whisper for other languages. |
| — | cartesia_version=2026-03-01 required | See API Conventions for details. |
channels, multichannel | — | Send a mono audio stream per WebSocket connection. |
endpointing, vad_events | — | Consider using auto finalization instead. |
diarize | — | Coming soon! |
keyterm, keywords | — | Coming soon! |
mip_opt_out | — | Controlled by your organization. |
encoding
encoding
| Deepgram | Cartesia |
|---|---|
linear16 | pcm_s16le |
linear32 | pcm_s32le |
mulaw | pcm_mulaw |
alaw | pcm_alaw |
| Not supported | pcm_f16le |
| Not supported | pcm_f32le |
flac | Not supported |
amr-nb | Not supported |
amr-wb | Not supported |
opus | Not supported |
ogg-opus | Not supported |
speex | Not supported |
g729 | Not supported |
Sending audio
Both APIs accept raw PCM audio as binary WebSocket frames in the same way.Cartesia does not support these encodings:Cartesia’s control commands are bare text frames, not JSON. To force the model to flush any buffered audio and emit the transcript:flac,amr-nb,amr-wb,opus,ogg-opus,speex,g729
KeepAlive message. The connection has a 3-minute idle timeout that resets every time you send an audio chunk — keep streaming audio (silent or otherwise) to hold it open.
Event mapping
Deepgram emits four server message types. Cartesia emits transcript chunks plus acknowledgments for thefinalize and close commands.
Deepgram type | Cartesia type | Notes |
|---|---|---|
Results | transcript | The main transcript event. See payload diff below. |
UtteranceEnd | — | No equivalent. Take a look at the guides page for details. |
SpeechStarted | — | No equivalent. Take a look at the guides page for details. |
| — | flush_done | Acknowledgment for finalize. |
| — | done | Acknowledgment for close. Sent immediately before the WebSocket closes. |
Metadata | — | Summary of the session. Sent before the server closes the socket. |
| — | error | Client or server errors. |
Results message:
transcript events:
- Ink 2 does not return
durationorwordsyet- Ink 2 and Whisper currently only emit final transcripts (
is_final: true)
Example Server Messages
Ink may break words. Nova’s transcripts are joined with spaces. Ink’s are not.
| Deepgram Nova | Cartesia Realtime STT (Manual) |
|---|---|
| SpeechStarted | — |
is_final: false "Ink" | is_final: true "Ink " |
is_final: false "Ink may break words." | is_final: true "may bre" |
is_final: true "Ink may break words." | is_final: true "ak words." |
| UtteranceEnd | — |
| SpeechStarted | — |
is_final: false "Nova's transcripts are joined with spaces." | is_final: true " Nova's transcripts are " |
is_final: true "Nova's transcripts are joined with spaces." | is_final: true "joined with spaces." |
| UtteranceEnd | — |
| SpeechStarted | — |
is_final: false "Ink's are not." | is_final: true " Ink" |
is_final: true "Ink's are not." | is_final: true "'s are not." |
| UtteranceEnd | — |
References
API Reference
Cartesia Realtime STT (Manual)
Full Code Example
Using the Cartesia SDK