Skip to main content
This guide covers migrating from ElevenLabs Realtime Speech to Text when used with commit_strategy=manual.

Back to guides

Other ways to migrate and best practices for Cartesia Speech-to-Text
This guide contains both bare API descriptions and SDK code. To install the SDK:
pip install cartesia
If you’re already using the Cartesia SDK, upgrade to version >=3.2.0

Connection

Replace the ElevenLabs WebSocket URL and auth header with Cartesia’s /stt/websocket.
- wss://api.elevenlabs.io/v1/speech-to-text/realtime?model_id=scribe_v2_realtime&audio_format=pcm_16000&commit_strategy=manual
+ wss://api.cartesia.ai/stt/websocket?model=ink-2&encoding=pcm_s16le&sample_rate=16000
- xi-api-key: <ELEVENLABS_API_KEY>
+ x-api-key: <CARTESIA_API_KEY>
+ cartesia-version: 2026-03-01
In browsers, WebSockets do not support request headers. Instead, pass the API version as the cartesia_version query param and use a short-lived access token using the access_token query param instead of an API key. Connect to the manual-finalization WebSocket with the Cartesia SDK:
import os
from cartesia import AsyncCartesia

client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY"))

async with client.stt.manual_finalize.websocket(
    model="ink-2", encoding="pcm_s16le", sample_rate=16000
) as connection:
    ...

Query parameters

ElevenLabs Scribe (manual)Cartesia Realtime STT (Manual)Notes
model_id=scribe_v2_realtime requiredmodel=ink-2 requiredSee Models for all options.
audio_format=pcm_16000encoding=pcm_s16le + sample_rate=16000 requiredElevenLabs bundles format and rate; Cartesia splits them. See encoding.
commit_strategy=manualSee auto finalization for automatic commits.
language_codelanguageink-2 only supports en right now. Use ink-whisper for other languages.
cartesia_version=2026-03-01 requiredSee API Conventions for details.
include_timestampsAlways included if supported by the model (ink-whisper only for now).
keytermsComing soon!
enable_loggingControlled by your organization.
ElevenLabs bundles the sample format and rate into a single audio_format token. Cartesia splits them into encoding and sample_rate.
ElevenLabs audio_formatCartesia encodingCartesia sample_rate
pcm_8000pcm_s16le8000
pcm_16000pcm_s16le16000
pcm_22050pcm_s16le22050
pcm_24000pcm_s16le24000
pcm_44100pcm_s16le44100
pcm_48000pcm_s16le48000
ulaw_8000pcm_mulaw8000
Cartesia also accepts pcm_s32le, pcm_f16le, pcm_f32le, and pcm_alaw.
All Cartesia encodings support all sample rates.

Sending audio

ElevenLabs wraps each audio chunk in a JSON formatted text frame and base64-encodes the audio bytes.
Cartesia accepts audio chunks as binary frames: send the raw audio bytes directly:
- { "message_type": "input_audio_chunk", "audio_base_64": "<base64 PCM>", "commit": false, "sample_rate": 16000 }
+ <raw PCM bytes>
  • No need to supply previous text
  • Sample rate is determined upon connection by the sample_rate query parameter
Cartesia’s control commands are bare text frames, not JSON. To commit buffered audio and emit a transcript without ending the session, send a finalize frame:
finalize
ElevenLabsCartesia
Streams partial_transcript events, then committed_transcript once you commit.Streams final transcript deltas continuously as audio arrives. finalize flushes text that might otherwise be held back for a few seconds.
It is important to send the finalize command at the right times in the audio stream.Consider using auto finalization if you don’t know when your user is done speaking.
To commit all remaining audio and close the session, send a close frame:
close
Cartesia will transcribe all buffered audio, then close the socket for you.

Sending audio with the SDK

# ElevenLabs
await elevenlabs_connection.send({
  "audio_base_64": b64encode(raw_audio),
})

# Cartesia
# raw_audio (bytes) - Raw audio data, about 100 ms at a time
await connection.send_raw(raw_audio)

Decoding base64 encoded audio before sending

# ElevenLabs
await elevenlabs_connection.send({
  "audio_base_64": audio_base_64,
})

# Cartesia
from base64 import b64decode
await connection.send_raw(b64decode(audio_base_64))

Committing and closing

# ElevenLabs
await elevenlabs_connection.commit()

# Cartesia
await connection.send("finalize")

# ElevenLabs: close the socket immediately
await elevenlabs_connection.close()

# Cartesia: commit remaining audio
# and let the server close the socket once done
await connection.send("close")

# Cartesia: Close the socket early (optional)
await connection.close()

Event mapping

Scribe emits interim partial_transcript events, then a committed_transcript when you commit.
Cartesia emits transcript deltas plus acknowledgments for the finalize and close commands.
ElevenLabs message_typeCartesia typeNotes
partial_transcripttranscript (is_final: false)Never sent by Ink 2 or Whisper (reserved for future models).
committed_transcripttranscript (is_final: true) + flush_doneElevenLabs sends committed transcripts in one message; Cartesia sends transcript messages containing deltas, then a flush_done message once all deltas have been sent for the segment
committed_transcript_with_timestampstranscript (is_final: true) + flush_doneOnly ink-whisper supports timestamps right now.
doneSent after all audio until close has been transcribed, immediately before the WebSocket closes.
errorerrorClient or server errors.
auth_errorCartesia will reject the WebSocket upgrade with a 401 or 403 HTTP status.
quota_exceedederrorCartesia’s error response will contain "error_code": "quota_exceeded".
rate_limitederrorCartesia’s error response will contain "error_code": "concurrency_limited".
session_time_limit_exceededCartesia will send a WebSocket close frame with code 1001.

Committed transcripts

A Scribe committed_transcript carries the full text of the segment since the last commit:
{
  "message_type": "committed_transcript",
  "text": "Hello world! This is the full transcript."
}
Becomes one or more Cartesia transcript events, each carrying a delta:
{
  "type": "transcript",
  "is_final": true,
  "text": "Hello world!",
  "duration": 0.5,
  "words": [
    {
      "word": "Hello",
      "start": 0,
      "end": 0.2
    },
    {
      "word": " world!",
      "start": 0.2,
      "end": 0.5
    }
  ],
  "request_id": "2ff8af53-4d38-479d-8287-58940f01c701"
}
  • Ink 2 does not return duration or words yet
  • Ink 2 and Whisper currently only emit final transcripts (is_final: true)
Followed by a Cartesia flush_done event:
{
  "type": "flush_done",
  "is_final": false,
  "request_id": "2ff8af53-4d38-479d-8287-58940f01c701"
}
Ignore the is_final property on flush_done and done events
Cartesia’s final transcripts are deltas; concatenate them without stripping or add whitespace.
import asyncio
from cartesia.types.stt import STTManualFinalizeWebsocketResponse

partial_transcript = ""

def on_message(message: STTManualFinalizeWebsocketResponse) -> None:
    global partial_transcript
    if message.type == "transcript" and message.is_final:
        # Do not strip or add whitespace!
        partial_transcript += message.text
        print(f"partial_transcript: {partial_transcript}")
    elif message.type == "flush_done" or message.type == "done":
        print(f"committed_transcript: {partial_transcript}")
        partial_transcript = ""
    elif message.type == "error":
        error_code = message.error_code or "unknown_error"
        if error_code == "quota_exceeded":
            print("You are out of credits")
        elif error_code == "concurrency_limited":
            print("You have too many open STT connections")
        else:
            print(f"{error_code}: {message.message}")

connection.on("event", on_message)

# ElevenLabs dispatches to your callbacks automatically;
# with Cartesia you run the dispatch loop yourself
recv_task = asyncio.create_task(connection.dispatch_events())

Example Server Messages

Scribe sends full transcripts. Ink sends deltas and may break words.
ElevenLabs Scribe (manual)Cartesia Realtime STT (Manual)
partial_transcript "Scribe sends"is_final: true "Scribe sends"
partial_transcript "Scribe sends full transcripts."is_final: true " full transc"
commit (client)finalize (client)
committed_transcript "Scribe sends full transcripts."is_final: true "ripts."
committed_transcript_with_timestamps "Scribe sends full transcripts."flush_done
partial_transcript "Ink sends deltas"is_final: true " Ink sends"
partial_transcript "Ink sends deltas and may break words."is_final: true " deltas and may break wor"
commit (client)finalize (client)
committed_transcript "Ink sends deltas and may break words."is_final: true "ds."
committed_transcript_with_timestamps "Ink sends deltas and may break words."flush_done

References

API Reference

Cartesia Realtime STT (Manual)

Full Code Example

Using the Cartesia SDK