Migrating From Deepgram Flux to Cartesia Ink 2

Cartesia’s Realtime Speech-to-Text API is similar to Deepgram’s Turn-based Audio (Flux) API. Both APIs emit turn-based events over a WebSocket, so porting an existing Flux integration is mostly a matter of renaming fields and updating a few connection parameters. If you want to tell the API when user turns end, see Realtime Speech-to-Text (External VAD) and the Deepgram Nova migration guide instead. This guide covers direct WebSocket usage. SDK-specific examples are coming soon.

Connection

Replace the Deepgram WebSocket URL and auth header with Cartesia’s.

- wss://api.deepgram.com/v2/listen?model=flux-general-en&encoding=linear16&sample_rate=16000
+ wss://api.cartesia.ai/stt/turns/websocket?model=ink-2&encoding=pcm_s16le&sample_rate=16000

- Authorization: Token <DEEPGRAM_API_KEY>
+ X-API-Key: <CARTESIA_API_KEY>

In browsers, WebSockets do not support request headers. Instead, pass the API version as the cartesia_version query param and use a short-lived access token using the access_token query param instead of an API key.

Query parameters

Deepgram Flux	Cartesia Ink 2	Notes
`model=flux-general-en` (Required)	`model=ink-2` (Required)	See STT Models for all options.
`encoding=linear16` (Required)	`encoding=pcm_s16le` (Required)	`linear16` → `pcm_s16le`, `linear32` → `pcm_s32le`, `mulaw` → `pcm_mulaw`, `alaw` → `pcm_alaw`.
`sample_rate` (Required)	`sample_rate` (Required)	No change.
`language_hint`	—	Only English is supported right now. Multi-lingual support is coming soon!
—	`cartesia_version=2026-03-01`	See API Conventions for details.
`eager_eot_threshold`	—	Turn detection is controlled by the model. Configuration is coming soon!
`eot_threshold`	—	Turn detection is controlled by the model. Configuration is coming soon!
`eot_timeout_ms`	—	Turn detection is controlled by the model. Configuration is coming soon!
`keyterm`	—	Coming soon!

Sending audio

Both APIs accept raw audio as binary WebSocket frames. No change to your audio pipeline — just make sure the bytes match the encoding and sample_rate you declared. To close the session, send a JSON encoded WebSocket text frame:

- { "type": "CloseStream" }
+ { "type": "close" }

Cartesia has no equivalent of Flux’s Configure control message since there’s no need to configure end-of-turn.

Event mapping

Flux wraps all turn events in a single TurnInfo message with an event discriminator. Cartesia emits one message type per event, with the type on the top-level type field.

Deepgram Flux (`TurnInfo.event`)	Cartesia (`type`)	Carries `transcript`?
`StartOfTurn`	`turn.start`	No (Flux: yes)
`Update`	`turn.update`	Yes
`EagerEndOfTurn`	`turn.eager_end`	Yes
`TurnResumed`	`turn.resume`	No (Flux: yes)
`EndOfTurn`	`turn.end`	Yes
`Connected`	`connected`	—
`Error`	`error`	—

A Flux TurnInfo message:

{
  "type": "TurnInfo",
  "event": "EndOfTurn",
  "turn_index": 0,
  "transcript": "Hi I need to cancel my subscription please.",
  "words": [...],
  "end_of_turn_confidence": 0.7,
  "audio_window_start": 0.0,
  "audio_window_end": 1.7
}

Becomes an Ink 2 turn.end event:

{
  "type": "turn.end",
  "transcript": "Hi I need to cancel my subscription please.",
  "request_id": "2ff8af53-4d38-479d-8287-58940f01c701"
}

Like Deepgram Flux, the transcript is cumulative within a turn. Ink 2 has the added benefit that all emitted transcripts are final; words are not emitted until the model is confident. Later events will only append to the transcript without modifying text sent by earlier events.

Fields that don’t have an equivalent

Cartesia does not emit:

turn_index
audio_window_start
audio_window_end
words
end_of_turn_confidence
sequence_id
languages
languages_hinted

Timestamps, end-of-turn confidence, and multilingual support are coming soon!

Event handler

The branching structure of your handler is unchanged — just the message shape.

  ws.onmessage = (message) => {
    const data = JSON.parse(message.data);
-   if (data.type !== "TurnInfo") return;
-   switch (data.event) {
-     case "StartOfTurn":    onTurnStart(); break;
-     case "Update":         onTranscriptUpdate(data.transcript); break;
-     case "EagerEndOfTurn": prepareReply(data.transcript); break;
-     case "TurnResumed":    cancelReply(); break;
-     case "EndOfTurn":      finalizeReply(data.transcript); break;
-   }
+   switch (data.type) {
+     case "turn.start":     onTurnStart(); break;
+     case "turn.update":    onTranscriptUpdate(data.transcript); break;
+     case "turn.eager_end": prepareReply(data.transcript); break;
+     case "turn.resume":    cancelReply(); break;
+     case "turn.end":       finalizeReply(data.transcript); break;
+   }
  };

Use the API

API Status

TTS

STT

Agents

Voices

Voice Changer

Auth

Datasets

Fine Tunes

Infill

Pronunciation Dicts

Admin

Migrating From Deepgram Flux to Cartesia Ink 2

Connection

Query parameters

Sending audio

Event mapping

Fields that don’t have an equivalent

Event handler

Use the API

API Status

TTS

STT

Agents

Voices

Voice Changer

Auth

Datasets

Fine Tunes

Infill

Pronunciation Dicts

Admin

Documentation Index

​Connection

​Query parameters

​Sending audio

​Event mapping

​Fields that don’t have an equivalent

​Event handler

Connection

Query parameters

Sending audio

Event mapping

Fields that don’t have an equivalent

Event handler