Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt

Use this file to discover all available pages before exploring further.

Cartesia’s Realtime Speech-to-Text API is similar to Deepgram’s Turn-based Audio (Flux) API. Both APIs emit turn-based events over a WebSocket, so porting an existing Flux integration is mostly a matter of renaming fields and updating a few connection parameters. If you want to tell the API when user turns end, see Realtime Speech-to-Text (External VAD) and the Deepgram Nova migration guide instead. This guide covers direct WebSocket usage. SDK-specific examples are coming soon.

Connection

Replace the Deepgram WebSocket URL and auth header with Cartesia’s.
- wss://api.deepgram.com/v2/listen?model=flux-general-en&encoding=linear16&sample_rate=16000
+ wss://api.cartesia.ai/stt/turns/websocket?model=ink-2&encoding=pcm_s16le&sample_rate=16000
- Authorization: Token <DEEPGRAM_API_KEY>
+ X-API-Key: <CARTESIA_API_KEY>
In browsers, WebSockets do not support request headers. Instead, pass the API version as the cartesia_version query param and use a short-lived access token using the access_token query param instead of an API key.

Query parameters

Deepgram FluxCartesia Ink 2Notes
model=flux-general-en (Required)model=ink-2 (Required)See STT Models for all options.
encoding=linear16 (Required)encoding=pcm_s16le (Required)linear16pcm_s16le, linear32pcm_s32le, mulawpcm_mulaw, alawpcm_alaw.
sample_rate (Required)sample_rate (Required)No change.
language_hintOnly English is supported right now. Multi-lingual support is coming soon!
cartesia_version=2026-03-01See API Conventions for details.
eager_eot_thresholdTurn detection is controlled by the model. Configuration is coming soon!
eot_thresholdTurn detection is controlled by the model. Configuration is coming soon!
eot_timeout_msTurn detection is controlled by the model. Configuration is coming soon!
keytermComing soon!

Sending audio

Both APIs accept raw audio as binary WebSocket frames. No change to your audio pipeline — just make sure the bytes match the encoding and sample_rate you declared. To close the session, send a JSON encoded WebSocket text frame:
- { "type": "CloseStream" }
+ { "type": "close" }
Cartesia has no equivalent of Flux’s Configure control message since there’s no need to configure end-of-turn.

Event mapping

Flux wraps all turn events in a single TurnInfo message with an event discriminator. Cartesia emits one message type per event, with the type on the top-level type field.
Deepgram Flux (TurnInfo.event)Cartesia (type)Carries transcript?
StartOfTurnturn.startNo (Flux: yes)
Updateturn.updateYes
EagerEndOfTurnturn.eager_endYes
TurnResumedturn.resumeNo (Flux: yes)
EndOfTurnturn.endYes
Connectedconnected
Errorerror
A Flux TurnInfo message:
{
  "type": "TurnInfo",
  "event": "EndOfTurn",
  "turn_index": 0,
  "transcript": "Hi I need to cancel my subscription please.",
  "words": [...],
  "end_of_turn_confidence": 0.7,
  "audio_window_start": 0.0,
  "audio_window_end": 1.7
}
Becomes an Ink 2 turn.end event:
{
  "type": "turn.end",
  "transcript": "Hi I need to cancel my subscription please.",
  "request_id": "2ff8af53-4d38-479d-8287-58940f01c701"
}
Like Deepgram Flux, the transcript is cumulative within a turn. Ink 2 has the added benefit that all emitted transcripts are final; words are not emitted until the model is confident. Later events will only append to the transcript without modifying text sent by earlier events.

Fields that don’t have an equivalent

Cartesia does not emit:
  • turn_index
  • audio_window_start
  • audio_window_end
  • words
  • end_of_turn_confidence
  • sequence_id
  • languages
  • languages_hinted
Timestamps, end-of-turn confidence, and multilingual support are coming soon!

Event handler

The branching structure of your handler is unchanged — just the message shape.
  ws.onmessage = (message) => {
    const data = JSON.parse(message.data);
-   if (data.type !== "TurnInfo") return;
-   switch (data.event) {
-     case "StartOfTurn":    onTurnStart(); break;
-     case "Update":         onTranscriptUpdate(data.transcript); break;
-     case "EagerEndOfTurn": prepareReply(data.transcript); break;
-     case "TurnResumed":    cancelReply(); break;
-     case "EndOfTurn":      finalizeReply(data.transcript); break;
-   }
+   switch (data.type) {
+     case "turn.start":     onTurnStart(); break;
+     case "turn.update":    onTranscriptUpdate(data.transcript); break;
+     case "turn.eager_end": prepareReply(data.transcript); break;
+     case "turn.resume":    cancelReply(); break;
+     case "turn.end":       finalizeReply(data.transcript); break;
+   }
  };