Skip to main content
This guide covers migrating from Deepgram Live Audio (Nova) when sending the Finalize command mid-session.

Back to guides

Other ways to migrate and best practices for Cartesia Speech-to-Text
This guide contains both bare API descriptions and SDK code. To install the SDK:
pip install cartesia
If you’re already using the Cartesia SDK, upgrade to version >=3.2.0

Connection

Replace the Deepgram WebSocket URL and auth header with Cartesia’s /stt/websocket.
- wss://api.deepgram.com/v1/listen?model=nova-3&encoding=linear16&sample_rate=16000
+ wss://api.cartesia.ai/stt/websocket?model=ink-2&encoding=pcm_s16le&sample_rate=16000
- Authorization: Token <DEEPGRAM_API_KEY>
+ Authorization: Bearer <CARTESIA_API_KEY>
+ Cartesia-Version: 2026-03-01
In browsers, WebSockets do not support request headers. Instead, pass the API version as the cartesia_version query param and use a short-lived access token using the access_token query param instead of an API key. Connect to the manual-finalization WebSocket with the Cartesia SDK:
import os
from cartesia import AsyncCartesia

client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY"))

async with client.stt.manual_finalize.websocket(
    model="ink-2", encoding="pcm_s16le", sample_rate=16000
) as connection:
    ...

Query parameters

Deepgram NovaCartesia Realtime STT (Manual)Notes
model=nova-3 requiredmodel=ink-2 requiredSee Models for all options.
version=latestModel version is controllable via the model param.
encoding=linear16encoding=pcm_s16le requiredSee encoding for all options.
sample_ratesample_rate requiredNo change.
languagelanguageink-2 only supports en right now. Use ink-whisper for other languages.
cartesia_version=2026-03-01 requiredSee API Conventions for details.
channels, multichannelSend a mono audio stream per WebSocket connection.
endpointing, vad_eventsConsider using auto finalization instead.
diarizeComing soon!
keyterm, keywordsComing soon!
mip_opt_outControlled by your organization.
DeepgramCartesia
linear16pcm_s16le
linear32pcm_s32le
mulawpcm_mulaw
alawpcm_alaw
Not supportedpcm_f16le
Not supportedpcm_f32le
flacNot supported
amr-nbNot supported
amr-wbNot supported
opusNot supported
ogg-opusNot supported
speexNot supported
g729Not supported

Sending audio

Both APIs accept raw PCM audio as binary WebSocket frames in the same way.
Cartesia does not support these encodings: flac, amr-nb, amr-wb, opus, ogg-opus, speex, g729
Cartesia’s control commands are bare text frames, not JSON. To force the model to flush any buffered audio and emit the transcript:
- { "type": "Finalize" }
+ finalize
Finalizing is optional for Deepgram Nova, but required for Cartesia Realtime STT (Manual).If you do not send the Finalize command mid-session with Deepgram Nova, consider using Cartesia Ink with automatic finalization instead.Take a look at the guides page for details.
To close the session cleanly:
- { "type": "CloseStream" }
+ close
Cartesia has no equivalent of Deepgram’s KeepAlive message. The connection has a 3-minute idle timeout that resets every time you send an audio chunk — keep streaming audio (silent or otherwise) to hold it open.
# Equivalent to 
# deepgram_connection.send_media(audio_chunk)
await connection.send_raw(audio_chunk)

# Equivalent to
# deepgram_connection.send_finalize()
await connection.send("finalize")

# Equivalent to
# deepgram_connection.send_close_stream()
await connection.send("close")

Event mapping

Deepgram emits four server message types. Cartesia emits transcript chunks plus acknowledgments for the finalize and close commands.
Deepgram typeCartesia typeNotes
ResultstranscriptThe main transcript event. See payload diff below.
UtteranceEndNo equivalent. Take a look at the guides page for details.
SpeechStartedNo equivalent. Take a look at the guides page for details.
flush_doneAcknowledgment for finalize.
doneAcknowledgment for close. Sent immediately before the WebSocket closes.
MetadataSummary of the session. Sent before the server closes the socket.
errorClient or server errors.
A Deepgram Results message:
{
  "type": "Results",
  "channel_index": [0, 1],
  "duration": 1.7,
  "start": 0.0,
  "is_final": true,
  "speech_final": true,
  "channel": {
    "alternatives": [
      {
        "transcript": "Hello world! This is the full transcript.",
        "confidence": 0.98,
        "words": [
          {
            "word": "hello",
            "start": 0,
            "end": 0.2,
            "confidence": 0.9,
            "punctuated_word": "Hello"
          },
          {
            "word": "world",
            "start": 0.2,
            "end": 0.5,
            "confidence": 0.9,
            "punctuated_word": "world!"
          },
          {
            "word": "this",
            "start": 0.7,
            "end": 0.9,
            "confidence": 0.8,
            "punctuated_word": "This"
          },
          ...
        ]
      }
    ]
  },
  "metadata": { ... }
}
Becomes one or more Cartesia transcript events:
{
  "type": "transcript",
  "is_final": true,
  "text": "Hello world!",
  "duration": 0.5,
  "words": [
    {
      "word": "Hello",
      "start": 0,
      "end": 0.2
    },
    {
      "word": " world!",
      "start": 0.2,
      "end": 0.5
    }
  ],
  "request_id": "2ff8af53-4d38-479d-8287-58940f01c701"
}
  • Ink 2 does not return duration or words yet
  • Ink 2 and Whisper currently only emit final transcripts (is_final: true)
Cartesia’s final transcripts are deltas
import asyncio
from cartesia.types.stt import STTManualFinalizeWebsocketResponse

final_transcript = ""

def on_message(message: STTManualFinalizeWebsocketResponse) -> None:
    global final_transcript
    if message.type == "transcript" and message.is_final:
        # Do not strip or add whitespace!
        final_transcript += message.text
    elif message.type == "flush_done":
        print("All audio up until 'finalize' was transcribed")
    elif message.type == "done":
        print("All audio up until 'close' was transcribed")
    elif message.type == "error":
        print(f"Error: {message.message}")

# Equivalent to
# deepgram_connection.on(EventType.MESSAGE, on_message)
connection.on("event", on_message)

# Equivalent to
# asyncio.create_task(deepgram_connection.start_listening())
recv_task = asyncio.create_task(connection.dispatch_events())

Example Server Messages

Ink may break words. Nova’s transcripts are joined with spaces. Ink’s are not.
Deepgram NovaCartesia Realtime STT (Manual)
SpeechStarted
is_final: false "Ink"is_final: true "Ink "
is_final: false "Ink may break words."is_final: true "may bre"
is_final: true "Ink may break words."is_final: true "ak words."
UtteranceEnd
SpeechStarted
is_final: false "Nova's transcripts are joined with spaces."is_final: true " Nova's transcripts are "
is_final: true "Nova's transcripts are joined with spaces."is_final: true "joined with spaces."
UtteranceEnd
SpeechStarted
is_final: false "Ink's are not."is_final: true " Ink"
is_final: true "Ink's are not."is_final: true "'s are not."
UtteranceEnd

References

API Reference

Cartesia Realtime STT (Manual)

Full Code Example

Using the Cartesia SDK