Migrating from Deepgram Nova with Automatic Finalization

This guide covers migrating from Deepgram Live Audio (Nova) when used without sending the Finalize command mid-session.

All migration guides

This guide contains both bare API descriptions and SDK code. To install the SDK:

pip install cartesia

npm i @cartesia/cartesia-js

If you’re already using the Cartesia SDK, upgrade to version >=3.2.0

Ink 2 only supports English right now.
We expect to add more languages in the coming months.

Connection

Replace the Deepgram WebSocket URL and auth header with Cartesia’s /stt/turns/websocket.

- wss://api.deepgram.com/v1/listen?model=nova-3&encoding=linear16&sample_rate=16000
+ wss://api.cartesia.ai/stt/turns/websocket?model=ink-2&encoding=pcm_s16le&sample_rate=16000

- Authorization: Token <DEEPGRAM_API_KEY>
+ Authorization: Bearer <CARTESIA_API_KEY>
+ Cartesia-Version: 2026-03-01

In browsers, WebSockets do not support request headers. Instead, pass the API version as the cartesia_version query param and use a short-lived access token using the access_token query param instead of an API key. Connect to the auto-finalization WebSocket with the Cartesia SDK:

import os
from cartesia import AsyncCartesia

client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY"))

async with client.stt.auto_finalize.websocket(
    model="ink-2", encoding="pcm_s16le", sample_rate=16000
) as connection:
    ...

import os
from cartesia import Cartesia

client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY"))

with client.stt.auto_finalize.websocket(
    model="ink-2", encoding="pcm_s16le", sample_rate=16000
) as connection:
    ...

import Cartesia from "@cartesia/cartesia-js";

const client = new Cartesia({ apiKey: process.env.CARTESIA_API_KEY });

const connection = client.stt.autoFinalize.websocket({
  model: "ink-2",
  encoding: "pcm_s16le",
  sample_rate: 16000,
});

// Server-side: Generate access-tokens using your API key
import Cartesia from '@cartesia/cartesia-js';

const client = new Cartesia({ apiKey: process.env.CARTESIA_API_KEY });

export async function GET() {
  const { token } = await client.accessToken.create({
    grants: { stt: true, tts: false, agent: false },
    // How long the token lasts in seconds
    // Allowed values: 0–3600
    expires_in: 3600,
  });
  return Response.json({ token });
}


// Client-side
// 1. Fetch an access token from your server
// 2. Connect to Cartesia via WebSocket
import Cartesia from "@cartesia/cartesia-js";

async function getToken(): Promise<string> {
  const res = await fetch('/replace-with-your-server');
  const { token } = await res.json();
  return token;
}
const audioContext = new AudioContext();

const client = new Cartesia({ token: await getToken() });

const connection = client.stt.autoFinalize.websocket({
  model: "ink-2",
  encoding: "pcm_f32le",
  sample_rate: audioContext.sampleRate,
});

Query parameters

Deepgram Nova	Cartesia Realtime STT (Auto)	Notes
`model=nova-3` required	`model=ink-2` required	See Models for all options.
`version=latest`	—	Model version is controllable via the `model` param.
`encoding=linear16`	`encoding=pcm_s16le` required	See encoding for all options.
`sample_rate`	`sample_rate` required	No change.
`language`	—	`ink-2` only supports `en` right now. More languages are coming soon!
—	`cartesia_version=2026-03-01` required	See API Conventions for details.
`channels`, `multichannel`	—	Send a mono audio stream per WebSocket connection.
`endpointing`, `interim_results`, `utterance_end_ms`, `vad_events`	—	Not required.
`keyterm`, `keywords`	`keyterm`	See Keyterm prompting.
`mip_opt_out`	—	Controlled by your organization.

encoding

Deepgram	Cartesia
`linear16`	`pcm_s16le`
`linear32`	`pcm_s32le`
`mulaw`	`pcm_mulaw`
`alaw`	`pcm_alaw`
Not supported	`pcm_f16le`
Not supported	`pcm_f32le`
`flac`	Not supported
`amr-nb`	Not supported
`amr-wb`	Not supported
`opus`	Not supported
`ogg-opus`	Not supported
`speex`	Not supported
`g729`	Not supported

Sending audio

Both APIs accept raw PCM audio as binary WebSocket frames in the same way.

Cartesia does not support these encodings: flac, amr-nb, amr-wb, opus, ogg-opus, speex, g729

Deepgram’s Finalize command has no equivalent. Ink detects turn boundaries on its own and emits a turn.end when the user stops speaking, so there is nothing to flush.

- { "type": "Finalize" }

If you currently send the Finalize command mid-session with Deepgram Nova, consider using Cartesia Ink with manual finalization instead.Take a look at the migration guides page for details.

To close the session cleanly, send a JSON text frame:

- { "type": "CloseStream" }
+ { "type": "close" }

Cartesia has no equivalent of Deepgram’s KeepAlive message. The connection has a 3-minute idle timeout that resets every time you send an audio chunk — keep streaming audio (silent or otherwise) to hold it open.

# Equivalent to 
# deepgram_connection.send_media(audio_chunk)
await connection.send_raw(audio_chunk)

# Equivalent to
# deepgram_connection.send_close_stream()
await connection.send({"type": "close"})

# Equivalent to 
# deepgram_connection.send_media(audio_chunk)
connection.send_raw(audio_chunk)

# Equivalent to
# deepgram_connection.send_close_stream()
connection.send({"type": "close"})

// Equivalent to deepgramConnection.sendMedia(audioChunk)
// @param {ArrayBufferLike} audioChunk - Note: Blob is not accepted
connection.sendRaw(audioChunk);

// Equivalent to
// deepgramConnection.sendCloseStream({ type: "CloseStream" })
connection.send({ type: "close" });

Event mapping

Deepgram emits four server message types, mixing transcript results with separate voice-activity signals. Cartesia folds the same information into a turn lifecycle: turn.start, turn.update, turn.eager_end, turn.resume, and turn.end. See Turn Detection for the full state machine.

Deepgram `type`	Cartesia `type`	Notes
`SpeechStarted`	`turn.start`	The user began speaking. Carries no transcript.
`Results` (`is_final: false`)	`turn.update`	Interim transcript for the utterance / turn.
`Results` (`is_final: true`)	`turn.end`	Final transcript for the utterance / turn.
`UtteranceEnd`	`turn.end`	The user stopped speaking.
—	`turn.eager_end`	The model predicts the user might be done speaking. Okay to ignore.
—	`turn.resume`	The user kept talking; ignore the last `turn.eager_end`.
`Metadata`	—	No equivalent.
—	`connected`	Fires once when the WebSocket is established. You do not need to wait for it before sending audio.
—	`error`	Client or server errors.

The Deepgram `Results` message

Deepgram’s Results carry a lot of information. You can extract similar information from Cartesia’s turn.update and turn.end events.

Deepgram `Results`	Cartesia	Notes
`is_final: false`	`turn.update`	Interim transcript for the utterance / turn. Cumulative since the last final transcript
`is_final: true`	`turn.end`	Final transcript for the utterance / turn.
`speech_final: true`	`turn.end`	The user stopped speaking.

Deepgram sends UtteranceEnd and speech_final: true separately, but they have the same semantic meaning: “the user has finished speaking”.Cartesia simplifies this into a single high-accuracy signal: turn.end.

A Deepgram Results message:

{
  "type": "Results",
  "is_final": true,
  "speech_final": true,
  "channel": {
    "alternatives": [
      {
        "transcript": "Hello world!",
        "confidence": 0.99,
        "words": [ ... ]
      }
    ]
  },
  "metadata": { ... }
}

Becomes a Cartesia turn.update / turn.end event:

{
  "type": "turn.end",
  "transcript": "Hello world!",
  "request_id": "33cacee6-1936-4949-a05b-ecc9f2393248"
}

turn.start and turn.resume events do not carry a transcript.

import asyncio
from cartesia.types.stt import STTAutoFinalizeWebsocketResponse

final_transcript = ""

def on_message(message: STTAutoFinalizeWebsocketResponse) -> None:
    global final_transcript
    if message.type == "turn.start":
        print("User started speaking")
    elif message.type == "turn.update":
        print("Transcript so far: " + final_transcript + message.transcript)
    elif message.type == "turn.end":
        print("User stopped speaking")
        # Do not strip or add spaces!
        final_transcript += message.transcript
    elif message.type == "error":
        print(f"Error: {message.message}")

# Equivalent to
# deepgram_connection.on(EventType.MESSAGE, on_message)
connection.on("event", on_message)

# Equivalent to
# asyncio.create_task(deepgram_connection.start_listening())
recv_task = asyncio.create_task(connection.dispatch_events())

import threading
from cartesia.types.stt import STTAutoFinalizeWebsocketResponse

final_transcript = ""

def on_message(message: STTAutoFinalizeWebsocketResponse) -> None:
    global final_transcript
    if message.type == "turn.start":
        print("User started speaking")
    elif message.type == "turn.update":
        print("Transcript so far: " + final_transcript + message.transcript)
    elif message.type == "turn.end":
        print("User stopped speaking")
        # Do not strip or add spaces!
        final_transcript += message.transcript
    elif message.type == "error":
        print(f"Error: {message.message}")

# Equivalent to
# deepgram_connection.on(EventType.MESSAGE, on_message)
connection.on("event", on_message)

# Equivalent to
# threading.Thread(target=deepgram_connection.start_listening, daemon=True).start()
threading.Thread(target=connection.dispatch_events, daemon=True).start()

import Cartesia from '@cartesia/cartesia-js';

let finalTranscript = '';

// Equivalent to
// deepgramConnection.on("message", (message) => { ... });
connection.on("event", (message: Cartesia.STT.AutoFinalize.STTAutoFinalizeWebsocketResponse) => {
  switch (message.type) {
    case "turn.start":
      console.log("User started speaking");
      break;
    case "turn.update":
      console.log("Transcript so far: " + finalTranscript + message.transcript);
      break;
    case "turn.end":
      console.log("User stopped speaking");
      // Do not trim or add spaces!
      finalTranscript += message.transcript
      break;
  }
});

// Equivalent to
// deepgramConnection.on("error", (error) => { ... });
connection.on("error", (error) => {
  if (error.error) {
    // Server sent error (may be a bad request or internal server error)
    console.error(`Server sent an error: ${error.error.message}`);
  } else {
    // Client error
    console.error(`Client had an error: ${error.message}`);
  }
});

Example Server Messages

Hello! Nova’s transcripts are joined with spaces. Ink’s are not.

Deepgram Nova	Cartesia Realtime STT (Auto)
SpeechStarted	turn.start
is_final: false `"Hello!"`	turn.update `"Hello!"`
—	turn.eager_end `"Hello!"`
—	turn.resume
is_final: false `"Hello! Nova's transcripts are joined with spaces."`	turn.update `"Hello! Nova's transcripts are joined with spaces."`
—	turn.eager_end `"Hello! Nova's transcripts are joined with spaces."`
is_final: true `"Hello! Nova's transcripts are joined with spaces."`	turn.end `"Hello! Nova's transcripts are joined with spaces."`
UtteranceEnd	—
SpeechStarted	turn.start
is_final: false `"Ink's are not."`	turn.update `" Ink's are not."`
—	turn.eager_end `" Ink's are not."`
is_final: true `"Ink's are not."`	turn.end `" Ink's are not."`
UtteranceEnd	—

Get Started

Text-to-Speech

Speech-to-Text

Tools

Integrations

Enterprise

Migrating from Deepgram Nova with Automatic Finalization

All migration guides

Connection

Query parameters

Sending audio

Event mapping

The Deepgram `Results` message

Example Server Messages

References

API Reference

Full Code Example

All migration guides

​Connection

​Query parameters

​Sending audio

​Event mapping

​The Deepgram Results message

​Example Server Messages

​References

API Reference

Full Code Example

Connection

Query parameters

Sending audio

Event mapping

The Deepgram `Results` message

Example Server Messages

References