Ink 2 only supports English right now.
We expect to add more languages in the coming months.
We expect to add more languages in the coming months.
Connection
Replace the Deepgram WebSocket URL and auth header with Cartesia’s.cartesia_version query param and use a short-lived access token using the access_token query param instead of an API key.
Query parameters
| Deepgram Flux | Cartesia Ink 2 | Notes |
|---|---|---|
model=flux-general-en (Required) | model=ink-2 (Required) | See STT Models for all options. |
encoding=linear16 (Required) | encoding=pcm_s16le (Required) | linear16 → pcm_s16le, linear32 → pcm_s32le, mulaw → pcm_mulaw, alaw → pcm_alaw. |
sample_rate (Required) | sample_rate (Required) | No change. |
language_hint | — | Only English is supported right now. Multi-lingual support is coming soon! |
| — | cartesia_version=2026-03-01 | See API Conventions for details. |
eager_eot_threshold | — | Turn detection is controlled by the model. Configuration is coming soon! |
eot_threshold | — | Turn detection is controlled by the model. Configuration is coming soon! |
eot_timeout_ms | — | Turn detection is controlled by the model. Configuration is coming soon! |
keyterm | — | Coming soon! |
Sending audio
Both APIs accept raw audio as binary WebSocket frames. No change to your audio pipeline — just make sure the bytes match theencoding and sample_rate you declared.
To close the session, send a JSON encoded WebSocket text frame:
Configure control message since there’s no need to configure end-of-turn.
Event mapping
Flux wraps all turn events in a singleTurnInfo message with an event discriminator. Cartesia emits one message type per event, with the type on the top-level type field.
Deepgram Flux (TurnInfo.event) | Cartesia (type) | Carries transcript? |
|---|---|---|
StartOfTurn | turn.start | No (Flux: yes) |
Update | turn.update | Yes |
EagerEndOfTurn | turn.eager_end | Yes |
TurnResumed | turn.resume | No (Flux: yes) |
EndOfTurn | turn.end | Yes |
Connected | connected | — |
Error | error | — |
TurnInfo message:
turn.end event:
transcript is cumulative within a turn.
Ink 2 has the added benefit that all emitted transcripts are final; words are not emitted until the model is confident. Later events will only append to the transcript without modifying text sent by earlier events.
Fields that don’t have an equivalent
Cartesia does not emit:turn_indexaudio_window_startaudio_window_endwordsend_of_turn_confidencesequence_idlanguageslanguages_hinted