Skip to main content
WSS
/
stt
/
turns
/
websocket

Documentation Index

Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt

Use this file to discover all available pages before exploring further.

Messages
model
type:string
required

ID of the model to use for transcription, e.g. ink-2. See Models for available models.

encoding
type:string
required

The encoding format of the audio data. This determines how the server interprets the raw binary audio data you send.

Supported encodings: pcm_s16le, pcm_s32le, pcm_f16le, pcm_f32le, pcm_mulaw, pcm_alaw.

For guidance on choosing an encoding, see Audio encodings.

sample_rate
type:string
required

The sample rate of the audio in Hz.

cartesia_version
type:string
required

API version. Provide this either by adding cartesia_version=2026-03-01 as a URL query parameter or Cartesia-Version: 2026-03-01 as a request header.

Browser WebSockets do not support request headers and should add the query parameter in the URL.

X-API-Key
type:httpApiKey

API key passed in a header.

access_token
type:httpApiKey

A short-lived access token passed in a query param to make API requests from a client. This is particularly useful in the browser, where WebSockets do not support headers. See Authenticate client apps to generate an access token.

Send Audio Data
type:string

Send WebSocket binary messages containing raw audio data as specified by the encoding and sample_rate query parameters.

Audio Requirements:

  • Send audio in small chunks (e.g., 100ms intervals) for optimal latency
  • Audio format must match the encoding and sample_rate parameters
Close Command
type:object

Send a JSON encoded close command as WebSocket text message to close the session cleanly. All buffered audio will be processed by the model into events.

Connected
type:object

Fires once when the WebSocket connection is established.

You do not need to wait for this event before sending audio.

Turn Start
type:object

Marks the start of a user turn. Fires quickly after the user begins speaking.

This event can be used to interrupt your agent to avoid talking over the user.

Turn Update
type:object

Fires repeatedly as the model transcribes the current user turn.

Turn Eager End [PREVIEW]
type:object

Fires when the model predicts that the user might be done speaking.

Turn Resume [PREVIEW]
type:object

Fires after turn.eager_end if the user turn has not actually ended.

Turn End
type:object

Marks the end of a user turn.

Error Response
type:object

Error information for STT WebSocket connections.