Migrating from ElevenLabs with Automatic Commits

This guide covers migrating from ElevenLabs Realtime Speech to Text when used with commit_strategy=vad.

All migration guides

This guide contains both bare API descriptions and SDK code. To install the SDK:

pip install cartesia

npm i @cartesia/cartesia-js

If you’re already using the Cartesia SDK, upgrade to version >=3.2.0

Ink 2 only supports English right now.
We expect to add more languages in the coming months.

Connection

Replace the ElevenLabs WebSocket URL and auth header with Cartesia’s /stt/turns/websocket.

- wss://api.elevenlabs.io/v1/speech-to-text/realtime?model_id=scribe_v2_realtime&audio_format=pcm_16000&commit_strategy=vad
+ wss://api.cartesia.ai/stt/turns/websocket?model=ink-2&encoding=pcm_s16le&sample_rate=16000

- xi-api-key: <ELEVENLABS_API_KEY>
+ x-api-key: <CARTESIA_API_KEY>
+ cartesia-version: 2026-03-01

In browsers, WebSockets do not support request headers. Instead, pass the API version as the cartesia_version query param and use a short-lived access token using the access_token query param instead of an API key. Connect to the auto-finalization WebSocket with the Cartesia SDK:

import os
from cartesia import AsyncCartesia

client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY"))

async with client.stt.auto_finalize.websocket(
    model="ink-2", encoding="pcm_s16le", sample_rate=16000
) as connection:
    ...

import Cartesia from "@cartesia/cartesia-js";

const client = new Cartesia({ apiKey: process.env.CARTESIA_API_KEY });

const connection = client.stt.autoFinalize.websocket({
  model: "ink-2",
  encoding: "pcm_s16le",
  sample_rate: 16000,
});

// Server-side: Generate access-tokens using your API key
import Cartesia from '@cartesia/cartesia-js';

const client = new Cartesia({ apiKey: process.env.CARTESIA_API_KEY });

export async function GET() {
  const { token } = await client.accessToken.create({
    grants: { stt: true, tts: false, agent: false },
    // How long the token lasts in seconds
    // Allowed values: 0–3600
    expires_in: 3600,
  });
  return Response.json({ token });
}


// Client-side
// 1. Fetch an access token from your server
// 2. Connect to Cartesia via WebSocket
import Cartesia from "@cartesia/cartesia-js";

async function getToken(): Promise<string> {
  const res = await fetch('/replace-with-your-server');
  const { token } = await res.json();
  return token;
}
const audioContext = new AudioContext();

const client = new Cartesia({ token: await getToken() });

const connection = client.stt.autoFinalize.websocket({
  model: "ink-2",
  encoding: "pcm_f32le",
  sample_rate: audioContext.sampleRate,
});

Query parameters

ElevenLabs Scribe (VAD)	Cartesia Realtime STT (Auto)	Notes
`model_id=scribe_v2_realtime` required	`model=ink-2` required	See Models for all options.
`audio_format=pcm_16000`	`encoding=pcm_s16le` + `sample_rate=16000` required	ElevenLabs bundles format and rate; Cartesia splits them. See encoding.
`commit_strategy=vad`	—	See manual finalization for manual commits.
`language_code`	—	`ink-2` only supports `en` right now. More languages are coming soon!
—	`cartesia_version=2026-03-01` required	See API Conventions for details.
`vad_silence_threshold_secs`, `vad_threshold`, `min_speech_duration_ms`, `min_silence_duration_ms`	—	Cartesia uses semantic turn detection. No VAD tuning required.
`include_timestamps`	—	Coming soon!
`keyterms`	`keyterm`	Note the singular `keyterm`. See Keyterm prompting.
`enable_logging`	—	Controlled by your organization.

encoding

ElevenLabs bundles the sample format and rate into a single audio_format token. Cartesia splits them into encoding and sample_rate.

ElevenLabs `audio_format`	Cartesia `encoding`	Cartesia `sample_rate`
`pcm_8000`	`pcm_s16le`	`8000`
`pcm_16000`	`pcm_s16le`	`16000`
`pcm_22050`	`pcm_s16le`	`22050`
`pcm_24000`	`pcm_s16le`	`24000`
`pcm_44100`	`pcm_s16le`	`44100`
`pcm_48000`	`pcm_s16le`	`48000`
`ulaw_8000`	`pcm_mulaw`	`8000`

Cartesia also accepts pcm_s32le, pcm_f16le, pcm_f32le, and pcm_alaw.
All Cartesia encodings support all sample rates.

Sending audio

ElevenLabs wraps each audio chunk in a JSON formatted text frame and base64-encodes the audio bytes.
Cartesia accepts audio chunks as binary frames: send the raw audio bytes directly:

- { "message_type": "input_audio_chunk", "audio_base_64": "<base64 PCM>", "commit": false, "sample_rate": 16000 }
+ <raw PCM bytes>

No need to supply previous text
Sample rate is determined upon connection by the sample_rate query parameter

If you currently commit audio mid-session with ElevenLabs, consider using Cartesia with manual finalization instead.Take a look at the migration guides page for details.

To commit all audio and close the session, send a JSON formatted text frame:

{ "type": "close" }

Cartesia will transcribe all buffered audio, then close the socket for you.

Sending audio with the SDK

# ElevenLabs
await elevenlabs_connection.send({
  "audio_base_64": b64encode(raw_audio),
})

# Cartesia
# raw_audio (bytes) - Raw audio data, about 100 ms at a time
await connection.send_raw(raw_audio)

// ElevenLabs
elevenLabsConnection.send({ audioBase64: rawAudio.toBase64() });

// Cartesia
// @param {ArrayBufferLike} rawAudio - raw audio data, about 100 ms at a time
connection.sendRaw(rawAudio);

Decoding base64 encoded audio before sending

# ElevenLabs
await elevenlabs_connection.send({
  "audio_base_64": audio_base_64,
})

# Cartesia
from base64 import b64decode
await connection.send_raw(b64decode(audio_base_64))

// ElevenLabs
elevenLabsConnection.send({ audioBase64 });

// Cartesia
connection.sendRaw(Uint8Array.fromBase64(audioBase64));

Committing and closing

# ElevenLabs
elevenlabs_connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, lambda: elevenlabs_connection.close())
await elevenlabs_connection.commit()

# Cartesia
await connection.send({"type": "close"})

# Cartesia: Close the socket early (optional)
await connection.close()

// ElevenLabs
elevenLabsConnection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, () => elevenLabsConnection.close());
elevenLabsConnection.commit();

// Cartesia
connection.send({ type: "close" });

// Cartesia: Close the socket early (optional)
connection.close();

Event mapping

Scribe emits a partial_transcript, then a committed_transcript when its VAD commits a segment. Cartesia folds the same information into a turn lifecycle: turn.start, turn.update, turn.eager_end, turn.resume, and turn.end. See Turn Detection for the full state machine.

ElevenLabs `message_type`	Cartesia `type`	Notes
`session_started`	`connected`	Connection confirmed. You do not need to wait for it before sending audio.
`partial_transcript`	`turn.update`	Partial transcript while the user is speaking.
`committed_transcript`	`turn.end`	User stopped speaking; contains the complete transcript for the user turn.
`committed_transcript_with_timestamps`	`turn.end`	Timestamps are not yet available.
—	`turn.start`	The user began speaking. Carries no transcript.
—	`turn.eager_end`	The model predicts the user might be done speaking. Okay to ignore.
—	`turn.resume`	The user kept talking; ignore the last `turn.eager_end`.
`error`	`error`	Client or server errors.
`auth_error`	—	Cartesia will reject the WebSocket upgrade with a 401 or 403 HTTP status.
`quota_exceeded`	`error`	Cartesia’s error response will contain `"error_code": "quota_exceeded"`.
`rate_limited`	`error`	Cartesia’s error response will contain `"error_code": "concurrency_limited"`.
`session_time_limit_exceeded`	—	Cartesia will send a WebSocket close frame with code `1001`.

Partial transcripts

An ElevenLabs partial_transcript:

{
  "message_type": "partial_transcript",
  "text": "Hello"
}

Becomes a Cartesia turn.update:

{
  "type": "turn.update",
  "transcript": "Hello",
  "request_id": "33cacee6-1936-4949-a05b-ecc9f2393248"
}

Committed transcripts

An ElevenLabs committed_transcript:

{
  "message_type": "committed_transcript",
  "text": "Hello world!"
}

Becomes a Cartesia turn.end:

{
  "type": "turn.end",
  "transcript": "Hello world!",
  "request_id": "33cacee6-1936-4949-a05b-ecc9f2393248"
}

import asyncio
from cartesia.types.stt import STTAutoFinalizeWebsocketResponse

def on_message(message: STTAutoFinalizeWebsocketResponse) -> None:
    if message.type == "turn.start":
        print("User started speaking")
    elif message.type == "turn.update":
        print(f"partial_transcript: {message.transcript}")
    elif message.type == "turn.end":
        print(f"committed_transcript: {message.transcript}")
    elif message.type == "error":
        error_code = message.error_code or "unknown_error"
        if error_code == "quota_exceeded":
            print("You are out of credits")
        elif error_code == "concurrency_limited":
            print("You have too many open STT connections")
        else:
            print(f"{error_code}: {message.message}")

connection.on("event", on_message)

# ElevenLabs dispatches to your callbacks automatically;
# with Cartesia you run the dispatch loop yourself
recv_task = asyncio.create_task(connection.dispatch_events())

import Cartesia from '@cartesia/cartesia-js';

connection.on("event", (message: Cartesia.STT.AutoFinalize.STTAutoFinalizeWebsocketResponse) => {
  switch (message.type) {
    case "turn.start":
      console.log("User started speaking");
      break;
    case "turn.update":
      console.log(`partial_transcript: ${message.transcript}`);
      break;
    case "turn.end":
      console.log(`committed_transcript: ${message.transcript}`);
      break;
  }
});

connection.on("error", (error) => {
  if (error.error) {
    // Server sent error (may be a bad request or internal server error)
    const errorCode = error.error.error_code || "unknown_error";
    switch (errorCode) {
      case "quota_exceeded":
        console.error("You are out of credits");
        break;
      case "concurrency_limited":
        console.error("You have too many open STT connections");
        break;
      default:
        console.error(`${errorCode}: ${error.error.message}`);
        break;
    }
  } else {
    // Client error
    console.error(`Client had an error: ${error.message}`);
  }
});

connection.on("close", (code: number, reason: string) => {
  if (code === 1001) {
    console.log("WebSocket closed due to inactivity");
  } else {
    console.log(`WebSocket closed (${code}): ${reason}`);
  }
});

Example Server Messages

Scribe’s transcripts are joined with spaces. Ink’s are not.

ElevenLabs Scribe (VAD)	Cartesia Realtime STT (Auto)
—	turn.start
partial_transcript `"Scribe's transcripts"`	turn.update `"Scribe's transcripts"`
—	turn.eager_end `"Scribe's transcripts"`
—	turn.resume
partial_transcript `"Scribe's transcripts are joined with spaces."`	turn.update `"Scribe's transcripts are joined with spaces."`
—	turn.eager_end `"Scribe's transcripts are joined with spaces."`
committed_transcript `"Scribe's transcripts are joined with spaces."`	turn.end `"Scribe's transcripts are joined with spaces."`
committed_transcript_with_timestamps `"Scribe's transcripts are joined with spaces."`	—
—	turn.start
partial_transcript `"Ink's are not."`	turn.update `" Ink's are not."`
—	turn.eager_end `" Ink's are not."`
committed_transcript `"Ink's are not."`	turn.end `" Ink's are not."`
committed_transcript_with_timestamps `"Ink's are not."`	—

Get Started

Text-to-Speech

Speech-to-Text

Tools

Integrations

Enterprise

Migrating from ElevenLabs with Automatic Commits

All migration guides

Connection

Query parameters

Sending audio

Sending audio with the SDK

Decoding base64 encoded audio before sending

Committing and closing

Event mapping

Partial transcripts

Committed transcripts

Example Server Messages

References

API Reference

Full Code Example

All migration guides

​Connection

​Query parameters

​Sending audio

​Sending audio with the SDK

​Decoding base64 encoded audio before sending

​Committing and closing

​Event mapping

​Partial transcripts

​Committed transcripts

​Example Server Messages

​References

API Reference

Full Code Example

Connection

Query parameters

Sending audio

Sending audio with the SDK

Decoding base64 encoded audio before sending

Committing and closing

Event mapping

Partial transcripts

Committed transcripts

Example Server Messages

References