Migrating from Deepgram Nova with Manual Finalization

This guide covers migrating from Deepgram Live Audio (Nova) when sending the Finalize command mid-session.

All migration guides

This guide contains both bare API descriptions and SDK code. To install the SDK:

pip install cartesia

npm i @cartesia/cartesia-js

If you’re already using the Cartesia SDK, upgrade to version >=3.2.0

Connection

Replace the Deepgram WebSocket URL and auth header with Cartesia’s /stt/websocket.

- wss://api.deepgram.com/v1/listen?model=nova-3&encoding=linear16&sample_rate=16000
+ wss://api.cartesia.ai/stt/websocket?model=ink-2&encoding=pcm_s16le&sample_rate=16000

- Authorization: Token <DEEPGRAM_API_KEY>
+ Authorization: Bearer <CARTESIA_API_KEY>
+ Cartesia-Version: 2026-03-01

In browsers, WebSockets do not support request headers. Instead, pass the API version as the cartesia_version query param and use a short-lived access token using the access_token query param instead of an API key. Connect to the manual-finalization WebSocket with the Cartesia SDK:

import os
from cartesia import AsyncCartesia

client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY"))

async with client.stt.manual_finalize.websocket(
    model="ink-2", encoding="pcm_s16le", sample_rate=16000
) as connection:
    ...

import os
from cartesia import Cartesia

client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY"))

with client.stt.manual_finalize.websocket(
    model="ink-2", encoding="pcm_s16le", sample_rate=16000
) as connection:
    ...

import Cartesia from "@cartesia/cartesia-js";

const client = new Cartesia({ apiKey: process.env.CARTESIA_API_KEY });

const connection = client.stt.manualFinalize.websocket({
  model: "ink-2",
  encoding: "pcm_s16le",
  sample_rate: 16000,
});

// Server-side: Generate access-tokens using your API key
import Cartesia from '@cartesia/cartesia-js';

const client = new Cartesia({ apiKey: process.env.CARTESIA_API_KEY });

export async function GET() {
  const { token } = await client.accessToken.create({
    grants: { stt: true, tts: false, agent: false },
    // How long the token lasts in seconds
    // Allowed values: 0–3600
    expires_in: 3600,
  });
  return Response.json({ token });
}


// Client-side
// 1. Fetch an access token from your server
// 2. Connect to Cartesia via WebSocket
import Cartesia from "@cartesia/cartesia-js";

async function getToken(): Promise<string> {
  const res = await fetch('/replace-with-your-server');
  const { token } = await res.json();
  return token;
}
const audioContext = new AudioContext();

const client = new Cartesia({ token: await getToken() });

const connection = client.stt.manualFinalize.websocket({
  model: "ink-2",
  encoding: "pcm_f32le",
  sample_rate: audioContext.sampleRate,
});

Query parameters

Deepgram Nova	Cartesia Realtime STT (Manual)	Notes
`model=nova-3` required	`model=ink-2` required	See Models for all options.
`version=latest`	—	Model version is controllable via the `model` param.
`encoding=linear16`	`encoding=pcm_s16le` required	See encoding for all options.
`sample_rate`	`sample_rate` required	No change.
`language`	`language`	`ink-2` only supports `en` right now. Use `ink-whisper` for other languages.
—	`cartesia_version=2026-03-01` required	See API Conventions for details.
`channels`, `multichannel`	—	Send a mono audio stream per WebSocket connection.
`endpointing`, `vad_events`	—	Consider using auto finalization instead.
`keyterm`, `keywords`	`keyterm`	See Keyterm prompting.
`mip_opt_out`	—	Controlled by your organization.

encoding

Deepgram	Cartesia
`linear16`	`pcm_s16le`
`linear32`	`pcm_s32le`
`mulaw`	`pcm_mulaw`
`alaw`	`pcm_alaw`
Not supported	`pcm_f16le`
Not supported	`pcm_f32le`
`flac`	Not supported
`amr-nb`	Not supported
`amr-wb`	Not supported
`opus`	Not supported
`ogg-opus`	Not supported
`speex`	Not supported
`g729`	Not supported

Sending audio

Both APIs accept raw PCM audio as binary WebSocket frames in the same way.

Cartesia does not support these encodings: flac, amr-nb, amr-wb, opus, ogg-opus, speex, g729

Cartesia’s control commands are bare text frames, not JSON. To force the model to flush any buffered audio and emit the transcript:

- { "type": "Finalize" }
+ finalize

Finalizing is optional for Deepgram Nova, but required for Cartesia Realtime STT (Manual).If you do not send the Finalize command mid-session with Deepgram Nova, consider using Cartesia Ink with automatic finalization instead.Take a look at the migration guides page for details.

To close the session cleanly:

- { "type": "CloseStream" }
+ close

Cartesia has no equivalent of Deepgram’s KeepAlive message. The connection has a 3-minute idle timeout that resets every time you send an audio chunk — keep streaming audio (silent or otherwise) to hold it open.

# Equivalent to 
# deepgram_connection.send_media(audio_chunk)
await connection.send_raw(audio_chunk)

# Equivalent to
# deepgram_connection.send_finalize()
await connection.send("finalize")

# Equivalent to
# deepgram_connection.send_close_stream()
await connection.send("close")

# Equivalent to 
# deepgram_connection.send_media(audio_chunk)
connection.send_raw(audio_chunk)

# Equivalent to
# deepgram_connection.send_finalize()
connection.send("finalize")

# Equivalent to
# deepgram_connection.send_close_stream()
connection.send("close")

// Equivalent to deepgramConnection.sendMedia(audioChunk)
// @param {ArrayBufferLike} audioChunk - Note: Blob is not accepted
connection.sendRaw(audioChunk);

// Equivalent to
// deepgramConnection.sendFinalize({ type: "Finalize" })
connection.send("finalize");

// Equivalent to
// deepgramConnection.sendCloseStream({ type: "CloseStream" })
connection.send("close");

Event mapping

Deepgram emits four server message types. Cartesia emits transcript chunks plus acknowledgments for the finalize and close commands.

Deepgram `type`	Cartesia `type`	Notes
`Results`	`transcript`	The main transcript event. See payload diff below.
`UtteranceEnd`	—	No equivalent. Take a look at the migration guides page for details.
`SpeechStarted`	—	No equivalent. Take a look at the migration guides page for details.
—	`flush_done`	Acknowledgment for `finalize`.
—	`done`	Acknowledgment for `close`. Sent immediately before the WebSocket closes.
`Metadata`	—	Summary of the session. Sent before the server closes the socket.
—	`error`	Client or server errors.

A Deepgram Results message:

{
  "type": "Results",
  "channel_index": [0, 1],
  "duration": 1.7,
  "start": 0.0,
  "is_final": true,
  "speech_final": true,
  "channel": {
    "alternatives": [
      {
        "transcript": "Hello world! This is the full transcript.",
        "confidence": 0.98,
        "words": [
          {
            "word": "hello",
            "start": 0,
            "end": 0.2,
            "confidence": 0.9,
            "punctuated_word": "Hello"
          },
          {
            "word": "world",
            "start": 0.2,
            "end": 0.5,
            "confidence": 0.9,
            "punctuated_word": "world!"
          },
          {
            "word": "this",
            "start": 0.7,
            "end": 0.9,
            "confidence": 0.8,
            "punctuated_word": "This"
          },
          ...
        ]
      }
    ]
  },
  "metadata": { ... }
}

Becomes one or more Cartesia transcript events:

{
  "type": "transcript",
  "is_final": true,
  "text": "Hello world!",
  "duration": 0.5,
  "words": [
    {
      "word": "Hello",
      "start": 0,
      "end": 0.2
    },
    {
      "word": " world!",
      "start": 0.2,
      "end": 0.5
    }
  ],
  "request_id": "2ff8af53-4d38-479d-8287-58940f01c701"
}

Ink 2 does not return duration or words yet

Ink 2 and Whisper currently only emit final transcripts (is_final: true)

Cartesia’s final transcripts are deltas

import asyncio
from cartesia.types.stt import STTManualFinalizeWebsocketResponse

final_transcript = ""

def on_message(message: STTManualFinalizeWebsocketResponse) -> None:
    global final_transcript
    if message.type == "transcript" and message.is_final:
        # Do not strip or add whitespace!
        final_transcript += message.text
    elif message.type == "flush_done":
        print("All audio up until 'finalize' was transcribed")
    elif message.type == "done":
        print("All audio up until 'close' was transcribed")
    elif message.type == "error":
        print(f"Error: {message.message}")

# Equivalent to
# deepgram_connection.on(EventType.MESSAGE, on_message)
connection.on("event", on_message)

# Equivalent to
# asyncio.create_task(deepgram_connection.start_listening())
recv_task = asyncio.create_task(connection.dispatch_events())

import threading
from cartesia.types.stt import STTManualFinalizeWebsocketResponse

final_transcript = ""

def on_message(message: STTManualFinalizeWebsocketResponse) -> None:
    global final_transcript
    if message.type == "transcript" and message.is_final:
        # Do not strip or add whitespace!
        final_transcript += message.text
    elif message.type == "flush_done":
        print("All audio up until 'finalize' was transcribed")
    elif message.type == "done":
        print("All audio up until 'close' was transcribed")
    elif message.type == "error":
        print(f"Error: {message.message}")

# Equivalent to
# deepgram_connection.on(EventType.MESSAGE, on_message)
connection.on("event", on_message)

# Equivalent to
# threading.Thread(target=deepgram_connection.start_listening, daemon=True).start()
threading.Thread(target=connection.dispatch_events, daemon=True).start()

import Cartesia from '@cartesia/cartesia-js';

let finalTranscript = '';

// Equivalent to
// deepgramConnection.on("message", (message) => { ... });
connection.on("event", (message: Cartesia.STT.ManualFinalize.STTManualFinalizeWebsocketResponse) => {
  switch (message.type) {
    case "transcript":
      if (message.is_final) {
        // Do not trim or add whitespace!
        finalTranscript += message.text;
        console.log(`Transcript so far: ${finalTranscript}`);
      }
      break;
    case "flush_done":
      console.log("All audio up until 'finalize' was transcribed");
      break;
    case "done":
      console.log("All audio up until 'close' was transcribed");
      break;
  }
});

// Equivalent to
// deepgramConnection.on("error", (error) => { ... });
connection.on("error", (error) => {
  if (error.error) {
    // Server sent error (may be a bad request or internal server error)
    console.error(`Server sent an error: ${error.error.message}`);
  } else {
    // Client error
    console.error(`Client had an error: ${error.message}`);
  }
});

Example Server Messages

Ink may break words. Nova’s transcripts are joined with spaces. Ink’s are not.

Deepgram Nova	Cartesia Realtime STT (Manual)
SpeechStarted	—
is_final: false `"Ink"`	is_final: true `"Ink "`
is_final: false `"Ink may break words."`	is_final: true `"may bre"`
is_final: true `"Ink may break words."`	is_final: true `"ak words."`
UtteranceEnd	—
SpeechStarted	—
is_final: false `"Nova's transcripts are joined with spaces."`	is_final: true `" Nova's transcripts are "`
is_final: true `"Nova's transcripts are joined with spaces."`	is_final: true `"joined with spaces."`
UtteranceEnd	—
SpeechStarted	—
is_final: false `"Ink's are not."`	is_final: true `" Ink"`
is_final: true `"Ink's are not."`	is_final: true `"'s are not."`
UtteranceEnd	—

Get Started

Text-to-Speech

Speech-to-Text

Tools

Integrations

Enterprise

Migrating from Deepgram Nova with Manual Finalization

All migration guides

Connection

Query parameters

Sending audio

Event mapping

Example Server Messages

References

API Reference

Full Code Example

All migration guides

​Connection

​Query parameters

​Sending audio

​Event mapping

​Example Server Messages

​References

API Reference

Full Code Example

Connection

Query parameters

Sending audio

Event mapping

Example Server Messages

References