Skip to main content
This guide covers migrating from Deepgram Live Audio (Nova) when used without sending the Finalize command mid-session.

Back to guides

Other ways to migrate and best practices for Cartesia Speech-to-Text
This guide contains both bare API descriptions and SDK code. To install the SDK:
pip install cartesia
If you’re already using the Cartesia SDK, upgrade to version >=3.2.0
Ink 2 only supports English right now.
We expect to add more languages in the coming months.

Connection

Replace the Deepgram WebSocket URL and auth header with Cartesia’s /stt/turns/websocket.
- wss://api.deepgram.com/v1/listen?model=nova-3&encoding=linear16&sample_rate=16000
+ wss://api.cartesia.ai/stt/turns/websocket?model=ink-2&encoding=pcm_s16le&sample_rate=16000
- Authorization: Token <DEEPGRAM_API_KEY>
+ Authorization: Bearer <CARTESIA_API_KEY>
+ Cartesia-Version: 2026-03-01
In browsers, WebSockets do not support request headers. Instead, pass the API version as the cartesia_version query param and use a short-lived access token using the access_token query param instead of an API key. Connect to the auto-finalization WebSocket with the Cartesia SDK:
import os
from cartesia import AsyncCartesia

client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY"))

async with client.stt.auto_finalize.websocket(
    model="ink-2", encoding="pcm_s16le", sample_rate=16000
) as connection:
    ...

Query parameters

Deepgram NovaCartesia Realtime STT (Auto)Notes
model=nova-3 requiredmodel=ink-2 requiredSee Models for all options.
version=latestModel version is controllable via the model param.
encoding=linear16encoding=pcm_s16le requiredSee encoding for all options.
sample_ratesample_rate requiredNo change.
languageink-2 only supports en right now. More languages are coming soon!
cartesia_version=2026-03-01 requiredSee API Conventions for details.
channels, multichannelSend a mono audio stream per WebSocket connection.
endpointing, interim_results, utterance_end_ms, vad_eventsNot required.
diarizeComing soon!
keyterm, keywordsComing soon!
mip_opt_outControlled by your organization.
DeepgramCartesia
linear16pcm_s16le
linear32pcm_s32le
mulawpcm_mulaw
alawpcm_alaw
Not supportedpcm_f16le
Not supportedpcm_f32le
flacNot supported
amr-nbNot supported
amr-wbNot supported
opusNot supported
ogg-opusNot supported
speexNot supported
g729Not supported

Sending audio

Both APIs accept raw PCM audio as binary WebSocket frames in the same way.
Cartesia does not support these encodings: flac, amr-nb, amr-wb, opus, ogg-opus, speex, g729
Deepgram’s Finalize command has no equivalent. Ink detects turn boundaries on its own and emits a turn.end when the user stops speaking, so there is nothing to flush.
- { "type": "Finalize" }
If you currently send the Finalize command mid-session with Deepgram Nova, consider using Cartesia Ink with manual finalization instead.Take a look at the guides page for details.
To close the session cleanly, send a JSON text frame:
- { "type": "CloseStream" }
+ { "type": "close" }
Cartesia has no equivalent of Deepgram’s KeepAlive message. The connection has a 3-minute idle timeout that resets every time you send an audio chunk — keep streaming audio (silent or otherwise) to hold it open.
# Equivalent to 
# deepgram_connection.send_media(audio_chunk)
await connection.send_raw(audio_chunk)

# Equivalent to
# deepgram_connection.send_close_stream()
await connection.send({"type": "close"})

Event mapping

Deepgram emits four server message types, mixing transcript results with separate voice-activity signals. Cartesia folds the same information into a turn lifecycle: turn.start, turn.update, turn.eager_end, turn.resume, and turn.end. See Turn Detection for the full state machine.
Deepgram typeCartesia typeNotes
SpeechStartedturn.startThe user began speaking. Carries no transcript.
Results (is_final: false)turn.updateInterim transcript for the utterance / turn.
Results (is_final: true)turn.endFinal transcript for the utterance / turn.
UtteranceEndturn.endThe user stopped speaking.
turn.eager_endThe model predicts the user might be done speaking. Okay to ignore.
turn.resumeThe user kept talking; ignore the last turn.eager_end.
MetadataNo equivalent.
connectedFires once when the WebSocket is established. You do not need to wait for it before sending audio.
errorClient or server errors.

The Deepgram Results message

Deepgram’s Results carry a lot of information. You can extract similar information from Cartesia’s turn.update and turn.end events.
Deepgram ResultsCartesiaNotes
is_final: falseturn.updateInterim transcript for the utterance / turn. Cumulative since the last final transcript
is_final: trueturn.endFinal transcript for the utterance / turn.
speech_final: trueturn.endThe user stopped speaking.
Deepgram sends UtteranceEnd and speech_final: true separately, but they have the same semantic meaning: “the user has finished speaking”.Cartesia simplifies this into a single high-accuracy signal: turn.end.
A Deepgram Results message:
{
  "type": "Results",
  "is_final": true,
  "speech_final": true,
  "channel": {
    "alternatives": [
      {
        "transcript": "Hello world!",
        "confidence": 0.99,
        "words": [ ... ]
      }
    ]
  },
  "metadata": { ... }
}
Becomes a Cartesia turn.update / turn.end event:
{
  "type": "turn.end",
  "transcript": "Hello world!",
  "request_id": "33cacee6-1936-4949-a05b-ecc9f2393248"
}
turn.start and turn.resume events do not carry a transcript.
import asyncio
from cartesia.types.stt import STTAutoFinalizeWebsocketResponse

final_transcript = ""

def on_message(message: STTAutoFinalizeWebsocketResponse) -> None:
    global final_transcript
    if message.type == "turn.start":
        print("User started speaking")
    elif message.type == "turn.update":
        print("Transcript so far: " + final_transcript + message.transcript)
    elif message.type == "turn.end":
        print("User stopped speaking")
        # Do not strip or add spaces!
        final_transcript += message.transcript
    elif message.type == "error":
        print(f"Error: {message.message}")

# Equivalent to
# deepgram_connection.on(EventType.MESSAGE, on_message)
connection.on("event", on_message)

# Equivalent to
# asyncio.create_task(deepgram_connection.start_listening())
recv_task = asyncio.create_task(connection.dispatch_events())

Example Server Messages

Hello! Nova’s transcripts are joined with spaces. Ink’s are not.
Deepgram NovaCartesia Realtime STT (Auto)
SpeechStartedturn.start
is_final: false "Hello!"turn.update "Hello!"
turn.eager_end "Hello!"
turn.resume
is_final: false "Hello! Nova's transcripts are joined with spaces."turn.update "Hello! Nova's transcripts are joined with spaces."
turn.eager_end "Hello! Nova's transcripts are joined with spaces."
is_final: true "Hello! Nova's transcripts are joined with spaces."turn.end "Hello! Nova's transcripts are joined with spaces."
UtteranceEnd
SpeechStartedturn.start
is_final: false "Ink's are not."turn.update " Ink's are not."
turn.eager_end " Ink's are not."
is_final: true "Ink's are not."turn.end " Ink's are not."
UtteranceEnd

References

API Reference

Cartesia Realtime STT (Auto)

Full Code Example

Using the Cartesia SDK