> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Migrating from OpenAI Realtime Transcription with Turn Detection

This guide covers migrating from OpenAI Realtime Transcription when used with `turn_detection: server_vad`.

<Card horizontal title="All migration guides" icon="arrow-left-from-arc" href="/use-the-api/stt/migrations" />

This guide contains both bare API descriptions and SDK code. To install the SDK:

<CodeGroup>
  ```bash Python theme={null}
  pip install cartesia
  ```

  ```bash TypeScript theme={null}
  npm i @cartesia/cartesia-js
  ```
</CodeGroup>

<Info>
  If you're already using the Cartesia SDK, upgrade to version `>=3.2.0`
</Info>

<Note>
  Ink 2 only supports English right now.\
  We expect to add more languages in the coming months.
</Note>

## Connection

Replace the OpenAI WebSocket URL and auth header with Cartesia's `/stt/turns/websocket`, including your desired model and input audio format as query parameters:

```diff theme={null}
- wss://api.openai.com/v1/realtime?intent=transcription
+ wss://api.cartesia.ai/stt/turns/websocket?model=ink-2&encoding=pcm_s16le&sample_rate=24000
```

```diff theme={null}
- Authorization: Bearer <OPENAI_API_KEY>
+ Authorization: Bearer <CARTESIA_API_KEY>
+ Cartesia-Version: 2026-03-01
```

In browsers, WebSockets do not support request headers. Instead, pass the API version as the `cartesia_version` query param and use a short-lived [access token](/get-started/authenticate-your-client-applications) using the `access_token` query param instead of an API key.

Connect to the auto-finalization WebSocket with the Cartesia SDK:

<CodeGroup>
  ```python Python (Async) theme={null}
  import os
  from cartesia import AsyncCartesia

  client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY"))

  async with client.stt.auto_finalize.websocket(
      model="ink-2", encoding="pcm_s16le", sample_rate=24000
  ) as connection:
      ...
  ```

  ```python Python theme={null}
  import os
  from cartesia import Cartesia

  client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY"))

  with client.stt.auto_finalize.websocket(
      model="ink-2", encoding="pcm_s16le", sample_rate=24000
  ) as connection:
      ...
  ```

  ```typescript TypeScript theme={null}
  import Cartesia from "@cartesia/cartesia-js";

  const client = new Cartesia({ apiKey: process.env.CARTESIA_API_KEY });

  const connection = client.stt.autoFinalize.websocket({
    model: "ink-2",
    encoding: "pcm_s16le",
    sample_rate: 24000,
  });
  ```

  ```typescript TypeScript (Browser) theme={null}
  // Server-side: Generate access-tokens using your API key
  import Cartesia from '@cartesia/cartesia-js';

  const client = new Cartesia({ apiKey: process.env.CARTESIA_API_KEY });

  export async function GET() {
    const { token } = await client.accessToken.create({
      grants: { stt: true, tts: false, agent: false },
      // How long the token lasts in seconds
      // Allowed values: 0–3600
      expires_in: 3600,
    });
    return Response.json({ token });
  }


  // Client-side
  // 1. Fetch an access token from your server
  // 2. Connect to Cartesia via WebSocket
  import Cartesia from "@cartesia/cartesia-js";

  async function getToken(): Promise<string> {
    const res = await fetch('/replace-with-your-server');
    const { token } = await res.json();
    return token;
  }
  const audioContext = new AudioContext();

  const client = new Cartesia({ token: await getToken() });

  const connection = client.stt.autoFinalize.websocket({
    model: "ink-2",
    encoding: "pcm_f32le",
    sample_rate: audioContext.sampleRate,
  });
  ```
</CodeGroup>

## Session configuration

OpenAI configures the session in the `session.update` payload. Cartesia takes the equivalent settings as query parameters.

| OpenAI session config                                   | Cartesia Realtime STT (Auto)                                                             | Notes                                                                                                                    |
| ------------------------------------------------------- | ---------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `?intent=transcription`                                 | —                                                                                        | Ink only supports transcription.                                                                                         |
| `audio.input.transcription.model` (`gpt-4o-transcribe`) | `model=ink-2` <Badge color="red" size="sm">required</Badge>                              | See [Models](/build-with-cartesia/stt/latest) for all options.                                                           |
| `audio.input.format` (`audio/pcm`, 24 kHz)              | `encoding=pcm_s16le` + `sample_rate=24000` <Badge color="red" size="sm">required</Badge> | Cartesia supports many more input audio formats. See [encoding](#encoding) for all options.                              |
| `audio.input.turn_detection` (`server_vad`)             | —                                                                                        | See [manual finalization](/use-the-api/stt/migrate-from-openai-realtime-transcription/manual) to disable turn detection. |
| `audio.input.transcription.language`                    | —                                                                                        | `ink-2` only supports `en` right now. More languages are coming soon!                                                    |
| `audio.input.transcription.delay`                       | —                                                                                        | Not configurable.                                                                                                        |
| `audio.input.noise_reduction`                           | —                                                                                        | Not required.                                                                                                            |
| —                                                       | `cartesia_version=2026-03-01` <Badge color="red" size="sm">required</Badge>              | See [API Conventions](/use-the-api/api-conventions#always-send-a-cartesia-version-header) for details.                   |

<Accordion title="encoding">
  OpenAI sets the input format under `audio.input.format`. Cartesia takes `encoding` and `sample_rate` as query parameters.

  | OpenAI `audio.input.format`              | Cartesia `encoding` | Cartesia `sample_rate` |
  | ---------------------------------------- | ------------------- | ---------------------- |
  | `{ "type": "audio/pcm", "rate": 24000 }` | `pcm_s16le`         | `24000`                |
  | `g711_ulaw`                              | `pcm_mulaw`         | `8000`                 |
  | `g711_alaw`                              | `pcm_alaw`          | `8000`                 |

  OpenAI's PCM format is 16-bit, 24 kHz, mono. Cartesia accepts that `sample_rate` directly, so you can stream the same audio without resampling. Cartesia also accepts `pcm_s32le`, `pcm_f16le`, and `pcm_f32le`.
</Accordion>

## Sending audio

OpenAI wraps each audio chunk in a JSON formatted text frame and base64-encodes the audio bytes.\
Cartesia accepts audio chunks as binary frames: send the raw audio bytes directly:

```diff theme={null}
- { "type": "input_audio_buffer.append", "audio": "<base64 PCM>" }
+ <raw PCM bytes>
```

There's no equivalent for OpenAI's `session.update` message; reconnect a new WebSocket to change parameters.

To commit all audio and close the session, send a JSON formatted text frame:

```json theme={null}
{ "type": "close" }
```

Cartesia will transcribe all buffered audio, then close the socket for you.

<Warning>
  If you currently commit audio mid-session with OpenAI using `input_audio_buffer.commit`, consider using Cartesia with [manual finalization](./manual) instead.

  Take a look at the [migration guides](/use-the-api/stt/migrations) page for details.
</Warning>

### Sending audio with the SDK

<CodeGroup>
  ```python Python (Async) theme={null}
  # raw_audio (bytes) - Raw audio data, about 100 ms at a time
  await connection.send_raw(raw_audio)
  ```

  ```python Python theme={null}
  # raw_audio (bytes) - Raw audio data, about 100 ms at a time
  connection.send_raw(raw_audio)
  ```

  ```typescript TypeScript theme={null}
  // @param {ArrayBufferLike} rawAudio - raw audio data, about 100 ms at a time
  connection.sendRaw(rawAudio);
  ```
</CodeGroup>

### Decoding base64 encoded audio before sending

<CodeGroup>
  ```python Python (Async) theme={null}
  from base64 import b64decode

  await connection.send_raw(b64decode(audio_base_64))
  ```

  ```python Python theme={null}
  from base64 import b64decode

  connection.send_raw(b64decode(audio_base_64))
  ```

  ```typescript TypeScript theme={null}
  connection.sendRaw(Uint8Array.fromBase64(audioBase64));
  ```
</CodeGroup>

### Closing

<CodeGroup>
  ```python Python (Async) theme={null}
  # Commit buffered audio
  # and let the server close the socket once done
  await connection.send({"type": "close"})

  # Close the socket early (optional)
  connection.close()
  ```

  ```python Python theme={null}
  # Commit buffered audio
  # and let the server close the socket once done
  connection.send({"type": "close"})

  # Close the socket early (optional)
  connection.close()
  ```

  ```typescript TypeScript theme={null}
  // Commit buffered audio
  // and let the server close the socket once done
  connection.send({ type: "close" });

  // Close the socket early (optional)
  connection.close()
  ```
</CodeGroup>

## Event mapping

OpenAI signals turns with `input_audio_buffer.speech_started` / `speech_stopped` / `committed`, then bursts transcript deltas and a `completed` event per turn.

Cartesia folds the same information into a turn lifecycle: `turn.start`, `turn.update`, `turn.eager_end`, `turn.resume`, and `turn.end`. See [Turn Detection](/use-the-api/stt/turns) for the full state machine.

| OpenAI `type`                                           | Cartesia `type`  | Notes                                                                                              |
| ------------------------------------------------------- | ---------------- | -------------------------------------------------------------------------------------------------- |
| `session.created` / `session.updated`                   | `connected`      | Cartesia has no session-config round-trip. You do not need to wait before sending audio.           |
| `input_audio_buffer.speech_started`                     | `turn.start`     | The user began speaking. Carries no transcript.                                                    |
| `conversation.item.input_audio_transcription.delta`     | `turn.update`    | OpenAI bursts deltas after the turn commits; Cartesia's `turn.update` streams **during** the turn. |
| `input_audio_buffer.speech_stopped` / `committed`       | `turn.end`       | The user stopped speaking and the turn committed.                                                  |
| `conversation.item.input_audio_transcription.completed` | `turn.end`       | Final transcript for the turn.                                                                     |
| —                                                       | `turn.eager_end` | The model predicts the user might be done speaking. Okay to ignore.                                |
| —                                                       | `turn.resume`    | The user kept talking; ignore the last `turn.eager_end`.                                           |
| `error`                                                 | `error`          | Client or server errors.                                                                           |

### Completed transcripts

An OpenAI `conversation.item.input_audio_transcription.completed` event:

```json theme={null}
{
  "type": "conversation.item.input_audio_transcription.completed",
  "item_id": "item_003",
  "content_index": 0,
  "transcript": "Hello world!"
}
```

Becomes a Cartesia `turn.end` event:

```json theme={null}
{
  "type": "turn.end",
  "transcript": "Hello world!",
  "request_id": "33cacee6-1936-4949-a05b-ecc9f2393248"
}
```

<Note>`turn.start` and `turn.resume` events do not carry a transcript.</Note>

<CodeGroup>
  ```python Python (Async) theme={null}
  import asyncio
  from cartesia.types.stt import STTAutoFinalizeWebsocketResponse

  full_transcript = ""

  async def receive() -> None:
      global full_transcript
      async for event in connection:
          if event.type == "turn.start":
              print("speech_started")
          elif event.type == "turn.update":
              # cumulative within a turn
              print(f"Transcript so far: {event.transcript}")
          elif event.type == "turn.end":
              # Do not strip or add spaces!
              full_transcript += event.transcript
              print(f"speech_stopped: {event.transcript}")
          elif event.type == "error":
              print(f"error: {event.message}")

  # Run receive() concurrently with your audio sender:
  #   await asyncio.gather(send_audio(), receive())
  ```

  ```python Python theme={null}
  from cartesia.types.stt import STTAutoFinalizeWebsocketResponse

  full_transcript = ""

  for event in connection:
      if event.type == "turn.start":
          print("speech_started")
      elif event.type == "turn.update":
          # cumulative within a turn
          print(f"Transcript so far: {event.transcript}")
      elif event.type == "turn.end":
          # Do not strip or add spaces!
          full_transcript += event.transcript
          print(f"speech_stopped: {event.transcript}")
      elif event.type == "error":
          print(f"error: {event.message}")
  ```

  ```typescript TypeScript theme={null}
  import Cartesia from '@cartesia/cartesia-js';

  let fullTranscript = '';

  for await (const event of connection.stream()) {
    if (event.type === 'message') {
      const m = event.message;
      switch (m.type) {
        case 'turn.start':
          console.log('speech_started');
          break;
        case 'turn.update':
          // cumulative within a turn
          console.log(`Transcript so far: ${m.transcript}`);
          break;
        case 'turn.end':
          // Do not trim or add spaces!
          fullTranscript += m.transcript;
          console.log(`speech_stopped: ${m.transcript}`);
          break;
      }
    } else if (event.type === 'error') {
      console.error(`error: ${event.error.message}`);
    }
  }
  ```
</CodeGroup>

## Example Server Messages

> OpenAI batches each turn. Ink streams within the turn.

| OpenAI gpt-4o-transcribe (server VAD)                                                                      | Cartesia Realtime STT (Auto)                                               |
| ---------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- |
| <Badge>session.updated</Badge>                                                                             | <Badge>connected</Badge>                                                   |
| <Badge>speech\_started</Badge>                                                                             | <Badge>turn.start</Badge>                                                  |
| —                                                                                                          | <Badge color="orange">turn.update</Badge> `"OpenAI batches"`               |
| —                                                                                                          | <Badge color="orange">turn.update</Badge> `"OpenAI batches each turn."`    |
| —                                                                                                          | <Badge>turn.eager\_end</Badge> `"OpenAI batches each turn."`               |
| <Badge>speech\_stopped</Badge> + <Badge>committed</Badge>                                                  | —                                                                          |
| <Badge color="orange">…transcription.delta</Badge> `"OpenAI batches each turn."` *(burst after commit)*    | —                                                                          |
| <Badge color="green">…transcription.completed</Badge> `"OpenAI batches each turn."`                        | <Badge color="green">turn.end</Badge> `"OpenAI batches each turn."`        |
| <Badge>speech\_started</Badge>                                                                             | <Badge>turn.start</Badge>                                                  |
| —                                                                                                          | <Badge color="orange">turn.update</Badge> `"Ink streams"`                  |
| —                                                                                                          | <Badge>turn.eager\_end</Badge> `"Ink streams"`                             |
| —                                                                                                          | <Badge>turn.resume</Badge>                                                 |
| —                                                                                                          | <Badge color="orange">turn.update</Badge> `"Ink streams within the turn."` |
| —                                                                                                          | <Badge>turn.eager\_end</Badge> `"Ink streams within the turn."`            |
| <Badge>speech\_stopped</Badge> + <Badge>committed</Badge>                                                  | —                                                                          |
| <Badge color="orange">…transcription.delta</Badge> `"Ink streams within the turn."` *(burst after commit)* | —                                                                          |
| <Badge color="green">…transcription.completed</Badge> `"Ink streams within the turn."`                     | <Badge color="green">turn.end</Badge> `"Ink streams within the turn."`     |

## References

<CardGroup cols={2}>
  <Card icon="code" title="API Reference" href="/api-reference/stt/turns/websocket">
    Cartesia Realtime STT (Auto)
  </Card>

  <Card icon="brackets-curly" title="Full Code Example" href="/examples/stt-auto-finalize-websocket">
    Using the Cartesia SDK
  </Card>
</CardGroup>
