> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Migrating from ElevenLabs with Automatic Commits

This guide covers migrating from ElevenLabs Realtime Speech to Text when used with `commit_strategy=vad`.

<Card horizontal title="All migration guides" icon="arrow-left-from-arc" href="/use-the-api/stt/migrations" />

This guide contains both bare API descriptions and SDK code. To install the SDK:

<CodeGroup>
  ```bash Python theme={null}
  pip install cartesia
  ```

  ```bash TypeScript theme={null}
  npm i @cartesia/cartesia-js
  ```
</CodeGroup>

<Info>
  If you're already using the Cartesia SDK, upgrade to version `>=3.2.0`
</Info>

<Note>
  Ink 2 only supports English right now.\
  We expect to add more languages in the coming months.
</Note>

## Connection

Replace the ElevenLabs WebSocket URL and auth header with Cartesia's `/stt/turns/websocket`.

```diff theme={null}
- wss://api.elevenlabs.io/v1/speech-to-text/realtime?model_id=scribe_v2_realtime&audio_format=pcm_16000&commit_strategy=vad
+ wss://api.cartesia.ai/stt/turns/websocket?model=ink-2&encoding=pcm_s16le&sample_rate=16000
```

```diff theme={null}
- xi-api-key: <ELEVENLABS_API_KEY>
+ x-api-key: <CARTESIA_API_KEY>
+ cartesia-version: 2026-03-01
```

In browsers, WebSockets do not support request headers. Instead, pass the API version as the `cartesia_version` query param and use a short-lived [access token](/get-started/authenticate-your-client-applications) using the `access_token` query param instead of an API key.

Connect to the auto-finalization WebSocket with the Cartesia SDK:

<CodeGroup>
  ```python Python theme={null}
  import os
  from cartesia import AsyncCartesia

  client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY"))

  async with client.stt.auto_finalize.websocket(
      model="ink-2", encoding="pcm_s16le", sample_rate=16000
  ) as connection:
      ...
  ```

  ```typescript TypeScript theme={null}
  import Cartesia from "@cartesia/cartesia-js";

  const client = new Cartesia({ apiKey: process.env.CARTESIA_API_KEY });

  const connection = client.stt.autoFinalize.websocket({
    model: "ink-2",
    encoding: "pcm_s16le",
    sample_rate: 16000,
  });
  ```

  ```typescript TypeScript (Browser) theme={null}
  // Server-side: Generate access-tokens using your API key
  import Cartesia from '@cartesia/cartesia-js';

  const client = new Cartesia({ apiKey: process.env.CARTESIA_API_KEY });

  export async function GET() {
    const { token } = await client.accessToken.create({
      grants: { stt: true, tts: false, agent: false },
      // How long the token lasts in seconds
      // Allowed values: 0–3600
      expires_in: 3600,
    });
    return Response.json({ token });
  }


  // Client-side
  // 1. Fetch an access token from your server
  // 2. Connect to Cartesia via WebSocket
  import Cartesia from "@cartesia/cartesia-js";

  async function getToken(): Promise<string> {
    const res = await fetch('/replace-with-your-server');
    const { token } = await res.json();
    return token;
  }
  const audioContext = new AudioContext();

  const client = new Cartesia({ token: await getToken() });

  const connection = client.stt.autoFinalize.websocket({
    model: "ink-2",
    encoding: "pcm_f32le",
    sample_rate: audioContext.sampleRate,
  });
  ```
</CodeGroup>

## Query parameters

| ElevenLabs Scribe (VAD)                                                                            | Cartesia Realtime STT (Auto)                                                             | Notes                                                                                                                  |
| -------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `model_id=scribe_v2_realtime` <Badge color="red" size="sm">required</Badge>                        | `model=ink-2` <Badge color="red" size="sm">required</Badge>                              | See [Models](/build-with-cartesia/stt/latest) for all options.                                                         |
| `audio_format=pcm_16000`                                                                           | `encoding=pcm_s16le` + `sample_rate=16000` <Badge color="red" size="sm">required</Badge> | ElevenLabs bundles format and rate; Cartesia splits them. See [encoding](#encoding).                                   |
| `commit_strategy=vad`                                                                              | —                                                                                        | See [manual finalization](/use-the-api/stt/migrate-from-elevenlabs-realtime-speech-to-text/manual) for manual commits. |
| `language_code`                                                                                    | —                                                                                        | `ink-2` only supports `en` right now. More languages are coming soon!                                                  |
| —                                                                                                  | `cartesia_version=2026-03-01` <Badge color="red" size="sm">required</Badge>              | See [API Conventions](/use-the-api/api-conventions#always-send-a-cartesia-version-header) for details.                 |
| `vad_silence_threshold_secs`, `vad_threshold`, `min_speech_duration_ms`, `min_silence_duration_ms` | —                                                                                        | Cartesia uses semantic turn detection. No VAD tuning required.                                                         |
| `include_timestamps`                                                                               | —                                                                                        | Coming soon!                                                                                                           |
| `keyterms`                                                                                         | —                                                                                        | Coming soon!                                                                                                           |
| `enable_logging`                                                                                   | —                                                                                        | Controlled by your organization.                                                                                       |

<Accordion title="encoding">
  ElevenLabs bundles the sample format and rate into a single `audio_format` token. Cartesia splits them into `encoding` and `sample_rate`.

  | ElevenLabs `audio_format` | Cartesia `encoding` | Cartesia `sample_rate` |
  | ------------------------- | ------------------- | ---------------------- |
  | `pcm_8000`                | `pcm_s16le`         | `8000`                 |
  | `pcm_16000`               | `pcm_s16le`         | `16000`                |
  | `pcm_22050`               | `pcm_s16le`         | `22050`                |
  | `pcm_24000`               | `pcm_s16le`         | `24000`                |
  | `pcm_44100`               | `pcm_s16le`         | `44100`                |
  | `pcm_48000`               | `pcm_s16le`         | `48000`                |
  | `ulaw_8000`               | `pcm_mulaw`         | `8000`                 |

  Cartesia also accepts `pcm_s32le`, `pcm_f16le`, `pcm_f32le`, and `pcm_alaw`.\
  All Cartesia encodings support all sample rates.
</Accordion>

## Sending audio

ElevenLabs wraps each audio chunk in a JSON formatted text frame and base64-encodes the audio bytes.\
Cartesia accepts audio chunks as binary frames: send the raw audio bytes directly:

```diff theme={null}
- { "message_type": "input_audio_chunk", "audio_base_64": "<base64 PCM>", "commit": false, "sample_rate": 16000 }
+ <raw PCM bytes>
```

* No need to supply previous text
* Sample rate is determined upon connection by the `sample_rate` query parameter

<Warning>
  If you currently commit audio mid-session with ElevenLabs, consider using Cartesia with [manual finalization](./manual) instead.

  Take a look at the [migration guides](/use-the-api/stt/migrations) page for details.
</Warning>

To commit all audio and close the session, send a JSON formatted text frame:

```json theme={null}
{ "type": "close" }
```

Cartesia will transcribe all buffered audio, then close the socket for you.

### Sending audio with the SDK

<CodeGroup>
  ```python Python theme={null}
  # ElevenLabs
  await elevenlabs_connection.send({
    "audio_base_64": b64encode(raw_audio),
  })

  # Cartesia
  # raw_audio (bytes) - Raw audio data, about 100 ms at a time
  await connection.send_raw(raw_audio)
  ```

  ```typescript TypeScript theme={null}
  // ElevenLabs
  elevenLabsConnection.send({ audioBase64: rawAudio.toBase64() });

  // Cartesia
  // @param {ArrayBufferLike} rawAudio - raw audio data, about 100 ms at a time
  connection.sendRaw(rawAudio);
  ```
</CodeGroup>

### Decoding base64 encoded audio before sending

<CodeGroup>
  ```python Python theme={null}
  # ElevenLabs
  await elevenlabs_connection.send({
    "audio_base_64": audio_base_64,
  })

  # Cartesia
  from base64 import b64decode
  await connection.send_raw(b64decode(audio_base_64))
  ```

  ```typescript TypeScript theme={null}
  // ElevenLabs
  elevenLabsConnection.send({ audioBase64 });

  // Cartesia
  connection.sendRaw(Uint8Array.fromBase64(audioBase64));
  ```
</CodeGroup>

### Committing and closing

<CodeGroup>
  ```python Python theme={null}
  # ElevenLabs
  elevenlabs_connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, lambda: elevenlabs_connection.close())
  await elevenlabs_connection.commit()

  # Cartesia
  await connection.send({"type": "close"})

  # Cartesia: Close the socket early (optional)
  await connection.close()
  ```

  ```typescript TypeScript theme={null}
  // ElevenLabs
  elevenLabsConnection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, () => elevenLabsConnection.close());
  elevenLabsConnection.commit();

  // Cartesia
  connection.send({ type: "close" });

  // Cartesia: Close the socket early (optional)
  connection.close();
  ```
</CodeGroup>

## Event mapping

Scribe emits a `partial_transcript`, then a `committed_transcript` when its VAD commits a segment.

Cartesia folds the same information into a turn lifecycle: `turn.start`, `turn.update`, `turn.eager_end`, `turn.resume`, and `turn.end`. See [Turn Detection](/use-the-api/stt/turns) for the full state machine.

| ElevenLabs `message_type`              | Cartesia `type`  | Notes                                                                         |
| -------------------------------------- | ---------------- | ----------------------------------------------------------------------------- |
| `session_started`                      | `connected`      | Connection confirmed. You do not need to wait for it before sending audio.    |
| `partial_transcript`                   | `turn.update`    | Partial transcript while the user is speaking.                                |
| `committed_transcript`                 | `turn.end`       | User stopped speaking; contains the complete transcript for the user turn.    |
| `committed_transcript_with_timestamps` | `turn.end`       | Timestamps are not yet available.                                             |
| —                                      | `turn.start`     | The user began speaking. Carries no transcript.                               |
| —                                      | `turn.eager_end` | The model predicts the user might be done speaking. Okay to ignore.           |
| —                                      | `turn.resume`    | The user kept talking; ignore the last `turn.eager_end`.                      |
| `error`                                | `error`          | Client or server errors.                                                      |
| `auth_error`                           | —                | Cartesia will reject the WebSocket upgrade with a 401 or 403 HTTP status.     |
| `quota_exceeded`                       | `error`          | Cartesia's error response will contain `"error_code": "quota_exceeded"`.      |
| `rate_limited`                         | `error`          | Cartesia's error response will contain `"error_code": "concurrency_limited"`. |
| `session_time_limit_exceeded`          | —                | Cartesia will send a WebSocket close frame with code `1001`.                  |

### Partial transcripts

An ElevenLabs `partial_transcript`:

```json theme={null}
{
  "message_type": "partial_transcript",
  "text": "Hello"
}
```

Becomes a Cartesia `turn.update`:

```json theme={null}
{
  "type": "turn.update",
  "transcript": "Hello",
  "request_id": "33cacee6-1936-4949-a05b-ecc9f2393248"
}
```

### Committed transcripts

An ElevenLabs `committed_transcript`:

```json theme={null}
{
  "message_type": "committed_transcript",
  "text": "Hello world!"
}
```

Becomes a Cartesia `turn.end`:

```json theme={null}
{
  "type": "turn.end",
  "transcript": "Hello world!",
  "request_id": "33cacee6-1936-4949-a05b-ecc9f2393248"
}
```

<CodeGroup>
  ```python Python theme={null}
  import asyncio
  from cartesia.types.stt import STTAutoFinalizeWebsocketResponse

  def on_message(message: STTAutoFinalizeWebsocketResponse) -> None:
      if message.type == "turn.start":
          print("User started speaking")
      elif message.type == "turn.update":
          print(f"partial_transcript: {message.transcript}")
      elif message.type == "turn.end":
          print(f"committed_transcript: {message.transcript}")
      elif message.type == "error":
          error_code = message.error_code or "unknown_error"
          if error_code == "quota_exceeded":
              print("You are out of credits")
          elif error_code == "concurrency_limited":
              print("You have too many open STT connections")
          else:
              print(f"{error_code}: {message.message}")

  connection.on("event", on_message)

  # ElevenLabs dispatches to your callbacks automatically;
  # with Cartesia you run the dispatch loop yourself
  recv_task = asyncio.create_task(connection.dispatch_events())
  ```

  ```typescript TypeScript theme={null}
  import Cartesia from '@cartesia/cartesia-js';

  connection.on("event", (message: Cartesia.STT.AutoFinalize.STTAutoFinalizeWebsocketResponse) => {
    switch (message.type) {
      case "turn.start":
        console.log("User started speaking");
        break;
      case "turn.update":
        console.log(`partial_transcript: ${message.transcript}`);
        break;
      case "turn.end":
        console.log(`committed_transcript: ${message.transcript}`);
        break;
    }
  });

  connection.on("error", (error) => {
    if (error.error) {
      // Server sent error (may be a bad request or internal server error)
      const errorCode = error.error.error_code || "unknown_error";
      switch (errorCode) {
        case "quota_exceeded":
          console.error("You are out of credits");
          break;
        case "concurrency_limited":
          console.error("You have too many open STT connections");
          break;
        default:
          console.error(`${errorCode}: ${error.error.message}`);
          break;
      }
    } else {
      // Client error
      console.error(`Client had an error: ${error.message}`);
    }
  });

  connection.on("close", (code: number, reason: string) => {
    if (code === 1001) {
      console.log("WebSocket closed due to inactivity");
    } else {
      console.log(`WebSocket closed (${code}): ${reason}`);
    }
  });
  ```
</CodeGroup>

## Example Server Messages

> Scribe's transcripts are joined with spaces. Ink's are not.

| ElevenLabs Scribe (VAD)                                                                                               | Cartesia Realtime STT (Auto)                                                               |
| --------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
| —                                                                                                                     | <Badge>turn.start</Badge>                                                                  |
| <Badge color="orange">partial\_transcript</Badge> `"Scribe's transcripts"`                                            | <Badge color="orange">turn.update</Badge> `"Scribe's transcripts"`                         |
| —                                                                                                                     | <Badge>turn.eager\_end</Badge> `"Scribe's transcripts"`                                    |
| —                                                                                                                     | <Badge>turn.resume</Badge>                                                                 |
| <Badge color="orange">partial\_transcript</Badge> `"Scribe's transcripts are joined with spaces."`                    | <Badge color="orange">turn.update</Badge> `"Scribe's transcripts are joined with spaces."` |
| —                                                                                                                     | <Badge>turn.eager\_end</Badge> `"Scribe's transcripts are joined with spaces."`            |
| <Badge color="green">committed\_transcript</Badge> `"Scribe's transcripts are joined with spaces."`                   | <Badge color="green">turn.end</Badge> `"Scribe's transcripts are joined with spaces."`     |
| <Badge color="green">committed\_transcript\_with\_timestamps</Badge> `"Scribe's transcripts are joined with spaces."` | —                                                                                          |
| —                                                                                                                     | <Badge>turn.start</Badge>                                                                  |
| <Badge color="orange">partial\_transcript</Badge> `"Ink's are not."`                                                  | <Badge color="orange">turn.update</Badge> `" Ink's are not."`                              |
| —                                                                                                                     | <Badge>turn.eager\_end</Badge> `" Ink's are not."`                                         |
| <Badge color="green">committed\_transcript</Badge> `"Ink's are not."`                                                 | <Badge color="green">turn.end</Badge> `" Ink's are not."`                                  |
| <Badge color="green">committed\_transcript\_with\_timestamps</Badge> `"Ink's are not."`                               | —                                                                                          |

## References

<CardGroup cols={2}>
  <Card icon="code" title="API Reference" href="/api-reference/stt/turns/websocket">
    Cartesia Realtime STT (Auto)
  </Card>

  <Card icon="brackets-curly" title="Full Code Example" href="/examples/stt-auto-finalize-websocket">
    Using the Cartesia SDK
  </Card>
</CardGroup>
