> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Migrating from OpenAI Realtime Transcription without Turn Detection

This guide covers migrating from OpenAI Realtime Transcription when used with `turn_detection: null`.

<Card horizontal title="All migration guides" icon="arrow-left-from-arc" href="/use-the-api/stt/migrations" />

This guide contains both bare API descriptions and SDK code. To install the SDK:

<CodeGroup>
  ```bash Python theme={null}
  pip install cartesia
  ```

  ```bash TypeScript theme={null}
  npm i @cartesia/cartesia-js
  ```
</CodeGroup>

<Info>
  If you're already using the Cartesia SDK, upgrade to version `>=3.2.0`
</Info>

## Connection

Replace the OpenAI WebSocket URL and auth header with Cartesia's `/stt/websocket`, including your desired model and input audio format as query parameters:

```diff theme={null}
- wss://api.openai.com/v1/realtime?intent=transcription
+ wss://api.cartesia.ai/stt/websocket?model=ink-2&encoding=pcm_s16le&sample_rate=24000
```

```diff theme={null}
- Authorization: Bearer <OPENAI_API_KEY>
+ Authorization: Bearer <CARTESIA_API_KEY>
+ Cartesia-Version: 2026-03-01
```

In browsers, WebSockets do not support request headers. Instead, pass the API version as the `cartesia_version` query param and use a short-lived [access token](/get-started/authenticate-your-client-applications) using the `access_token` query param instead of an API key.

Connect to the manual-finalization WebSocket with the Cartesia SDK:

<CodeGroup>
  ```python Python (Async) theme={null}
  import os
  from cartesia import AsyncCartesia

  client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY"))

  async with client.stt.manual_finalize.websocket(
      model="ink-2", encoding="pcm_s16le", sample_rate=24000
  ) as connection:
      ...
  ```

  ```python Python theme={null}
  import os
  from cartesia import Cartesia

  client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY"))

  with client.stt.manual_finalize.websocket(
      model="ink-2", encoding="pcm_s16le", sample_rate=24000
  ) as connection:
      ...
  ```

  ```typescript TypeScript theme={null}
  import Cartesia from "@cartesia/cartesia-js";

  const client = new Cartesia({ apiKey: process.env.CARTESIA_API_KEY });

  const connection = client.stt.manualFinalize.websocket({
    model: "ink-2",
    encoding: "pcm_s16le",
    sample_rate: 24000,
  });
  ```

  ```typescript TypeScript (Browser) theme={null}
  // Server-side: Generate access-tokens using your API key
  import Cartesia from '@cartesia/cartesia-js';

  const client = new Cartesia({ apiKey: process.env.CARTESIA_API_KEY });

  export async function GET() {
    const { token } = await client.accessToken.create({
      grants: { stt: true, tts: false, agent: false },
      // How long the token lasts in seconds
      // Allowed values: 0–3600
      expires_in: 3600,
    });
    return Response.json({ token });
  }


  // Client-side
  // 1. Fetch an access token from your server
  // 2. Connect to Cartesia via WebSocket
  import Cartesia from "@cartesia/cartesia-js";

  async function getToken(): Promise<string> {
    const res = await fetch('/replace-with-your-server');
    const { token } = await res.json();
    return token;
  }
  const audioContext = new AudioContext();

  const client = new Cartesia({ token: await getToken() });

  const connection = client.stt.manualFinalize.websocket({
    model: "ink-2",
    encoding: "pcm_f32le",
    sample_rate: audioContext.sampleRate,
  });
  ```
</CodeGroup>

## Session configuration

OpenAI configures the session in the `session.update` payload. Cartesia takes the equivalent settings as query parameters.

| OpenAI session config                      | Cartesia Realtime STT (Manual)                                                           | Notes                                                                                                                             |
| ------------------------------------------ | ---------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
| `?intent=transcription`                    | —                                                                                        | Ink only supports transcription.                                                                                                  |
| `audio.input.transcription.model`          | `model=ink-2` <Badge color="red" size="sm">required</Badge>                              | `gpt-realtime-whisper` and `gpt-4o-transcribe` both map to `ink-2`.                                                               |
| `audio.input.format` (`audio/pcm`, 24 kHz) | `encoding=pcm_s16le` + `sample_rate=24000` <Badge color="red" size="sm">required</Badge> | Cartesia supports many more input audio formats. See [encoding](#encoding) for all options.                                       |
| `audio.input.transcription.language`       | `language`                                                                               | `ink-2` only supports `en` right now. Use `ink-whisper` for [other languages](/build-with-cartesia/stt/older-models#ink-whisper). |
| `audio.input.turn_detection` (`null`)      | —                                                                                        | See [auto finalization](/use-the-api/stt/migrate-from-openai-realtime-transcription/auto) for server-side turn detection.         |
| `audio.input.transcription.delay`          | —                                                                                        | Not configurable.                                                                                                                 |
| `audio.input.noise_reduction`              | —                                                                                        | Not required.                                                                                                                     |
| —                                          | `cartesia_version=2026-03-01` <Badge color="red" size="sm">required</Badge>              | See [API Conventions](/use-the-api/api-conventions#always-send-a-cartesia-version-header) for details.                            |

<Accordion title="encoding">
  OpenAI sets the input format under `audio.input.format`. Cartesia takes `encoding` and `sample_rate` as query parameters.

  | OpenAI `audio.input.format`              | Cartesia `encoding` | Cartesia `sample_rate` |
  | ---------------------------------------- | ------------------- | ---------------------- |
  | `{ "type": "audio/pcm", "rate": 24000 }` | `pcm_s16le`         | `24000`                |
  | `g711_ulaw`                              | `pcm_mulaw`         | `8000`                 |
  | `g711_alaw`                              | `pcm_alaw`          | `8000`                 |

  OpenAI's PCM format is 16-bit, 24 kHz, mono. Cartesia accepts that `sample_rate` directly, so you can stream the same audio without resampling. Cartesia also accepts `pcm_s32le`, `pcm_f16le`, and `pcm_f32le`.
</Accordion>

## Sending audio

OpenAI wraps each audio chunk in a JSON formatted text frame and base64-encodes the audio bytes.\
Cartesia accepts audio chunks as binary frames: send the raw audio bytes directly:

```diff theme={null}
- { "type": "input_audio_buffer.append", "audio": "<base64 PCM>" }
+ <raw PCM bytes>
```

There's no equivalent for OpenAI's `session.update` message; reconnect a new WebSocket to change parameters.

Cartesia's control commands are bare text frames, not JSON.

To commit buffered audio and emit a transcript, send a `finalize` frame in place of `input_audio_buffer.commit`:

```text theme={null}
finalize
```

<Warning>
  It is important to send the `finalize` command at the right times in the audio stream.

  Consider using [auto finalization](./auto) if you don't know when your user is done speaking.
</Warning>

To transcribe all remaining audio and close the session, send a `close` frame:

```text theme={null}
close
```

### Sending audio with the SDK

<CodeGroup>
  ```python Python (Async) theme={null}
  # raw_audio (bytes) - Raw audio data, about 100 ms at a time
  await connection.send_raw(raw_audio)
  ```

  ```python Python theme={null}
  # raw_audio (bytes) - Raw audio data, about 100 ms at a time
  connection.send_raw(raw_audio)
  ```

  ```typescript TypeScript theme={null}
  // @param {ArrayBufferLike} rawAudio - raw audio data, about 100 ms at a time
  connection.sendRaw(rawAudio);
  ```
</CodeGroup>

### Decoding base64 encoded audio before sending

<CodeGroup>
  ```python Python (Async) theme={null}
  from base64 import b64decode

  await connection.send_raw(b64decode(audio_base_64))
  ```

  ```python Python theme={null}
  from base64 import b64decode

  connection.send_raw(b64decode(audio_base_64))
  ```

  ```typescript TypeScript theme={null}
  connection.sendRaw(Uint8Array.fromBase64(audioBase64));
  ```
</CodeGroup>

### Finalizing and closing

<CodeGroup>
  ```python Python (Async) theme={null}
  # Commit input audio
  await connection.send("finalize")

  # Transcribe remaining audio, then close the socket
  await connection.send("close")
  ```

  ```python Python theme={null}
  # Commit input audio
  connection.send("finalize")

  # Transcribe remaining audio, then close the socket
  connection.send("close")
  ```

  ```typescript TypeScript theme={null}
  // Commit input audio
  connection.send("finalize");

  // Transcribe remaining audio, then close the socket
  connection.send("close");
  ```
</CodeGroup>

## Event mapping

OpenAI streams `conversation.item.input_audio_transcription.delta` events and a `completed` event per committed turn.\
Cartesia emits `transcript` deltas plus acknowledgments for the `finalize` and `close` commands.

| OpenAI `type`                                           | Cartesia `type`                 | Notes                                                                     |
| ------------------------------------------------------- | ------------------------------- | ------------------------------------------------------------------------- |
| `session.created` / `session.updated`                   | —                               | Cartesia has no session-config round-trip. Just start sending audio.      |
| `conversation.item.input_audio_transcription.delta`     | `transcript`                    | Ink 2 and Whisper only send `is_final: true`. See the row below.          |
| `conversation.item.input_audio_transcription.completed` | `transcript` (`is_final: true`) | OpenAI sends the full committed transcript; Cartesia streams **deltas**.  |
| `input_audio_buffer.committed`                          | `flush_done`                    | Acknowledgment that the buffer was processed after a commit / `finalize`. |
| —                                                       | `done`                          | Acknowledgment for `close`. Sent immediately before the WebSocket closes. |
| `error`                                                 | `error`                         | Client or server errors.                                                  |

### Completed transcripts

An OpenAI `conversation.item.input_audio_transcription.completed` event carries the **full turn**:

```json theme={null}
{
  "type": "conversation.item.input_audio_transcription.completed",
  "item_id": "item_003",
  "content_index": 0,
  "transcript": "Hello world! This is the full transcript."
}
```

Becomes one or more Cartesia `transcript` events, each carrying a **delta**:

```json theme={null}
{
  "type": "transcript",
  "is_final": true,
  "text": "Hello world!",
  "duration": 0.5,
  "words": [
    {
      "word": "Hello",
      "start": 0,
      "end": 0.2
    },
    {
      "word": " world!",
      "start": 0.2,
      "end": 0.5
    }
  ],
  "request_id": "2ff8af53-4d38-479d-8287-58940f01c701"
}
```

> * Ink 2 does not return `duration` or `words` yet
> * Ink 2 and Whisper currently only emit final transcripts (`is_final: true`)

<Tip>Cartesia's final transcripts are **deltas**; concatenate them without stripping or add whitespace.</Tip>

<CodeGroup>
  ```python Python (Async) theme={null}
  import asyncio
  from cartesia.types.stt import STTManualFinalizeWebsocketResponse

  committed_transcript = ""

  async def receive() -> None:
      global committed_transcript
      async for event in connection:
          if event.type == "transcript":
              if event.is_final:
                # Do not strip or add whitespace!
                committed_transcript += event.text
          elif event.type == "flush_done" or event.type == "done":
              print(f"Transcript: {committed_transcript}")
              committed_transcript = ""
          elif event.type == "error":
              print(f"error: {event.message}")

  # Run receive() concurrently with your audio sender:
  #   await asyncio.gather(send_audio(), receive())
  ```

  ```python Python theme={null}
  from cartesia.types.stt import STTManualFinalizeWebsocketResponse

  committed_transcript = ""

  for event in connection:
      if event.type == "transcript":
          if event.is_final:
            # Do not strip or add whitespace!
            committed_transcript += event.text
      elif event.type == "flush_done" or event.type == "done":
          print(f"Transcript: {committed_transcript}")
          committed_transcript = ""
      elif event.type == "error":
          print(f"error: {event.message}")
  ```

  ```typescript TypeScript theme={null}
  import Cartesia from '@cartesia/cartesia-js';

  let committedTranscript = '';

  for await (const event of connection.stream()) {
    if (event.type === 'message') {
      const m = event.message;
      switch (m.type) {
        case 'transcript':
          if (m.is_final) {
            // Do not trim or add whitespace!
            committedTranscript += m.text;
          }
          break;
        case 'flush_done':
        case 'done':
          console.log(`Transcript: ${committedTranscript}`);
          committedTranscript = '';
          break;
      }
    } else if (event.type === 'error') {
      console.error(`error: ${event.error.message}`);
    }
  }
  ```
</CodeGroup>

## Example Server Messages

> GPT sends full transcripts. Ink sends deltas and may break words.

| OpenAI gpt-realtime-whisper                                                                     | Cartesia Realtime STT (Manual)                                             |
| ----------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- |
| <Badge color="orange">…transcription.delta</Badge> `"GPT sends"`                                | <Badge color="green">is\_final: true</Badge> `"GPT sends"`                 |
| <Badge color="orange">…transcription.delta</Badge> `" full transcripts."`                       | <Badge color="green">is\_final: true</Badge> `" full transc"`              |
| <Badge>commit</Badge> *(client)*                                                                | <Badge>finalize</Badge> *(client)*                                         |
| <Badge>input\_audio\_buffer.committed</Badge>                                                   | <Badge color="green">is\_final: true</Badge> `"ripts."`                    |
| <Badge color="green">…transcription.completed</Badge> `"GPT sends full transcripts."`           | <Badge>flush\_done</Badge>                                                 |
| <Badge color="orange">…transcription.delta</Badge> `"Ink sends deltas"`                         | <Badge color="green">is\_final: true</Badge> `" Ink sends"`                |
| <Badge color="orange">…transcription.delta</Badge> `" and may break words."`                    | <Badge color="green">is\_final: true</Badge> `" deltas and may break wor"` |
| <Badge>commit</Badge> *(client)*                                                                | <Badge>finalize</Badge> *(client)*                                         |
| <Badge>input\_audio\_buffer.committed</Badge>                                                   | <Badge color="green">is\_final: true</Badge> `"ds."`                       |
| <Badge color="green">…transcription.completed</Badge> `"Ink sends deltas and may break words."` | <Badge>flush\_done</Badge>                                                 |

## References

<CardGroup cols={2}>
  <Card icon="code" title="API Reference" href="/api-reference/stt/websocket">
    Cartesia Realtime STT (Manual)
  </Card>

  <Card icon="brackets-curly" title="Full Code Example" href="/examples/stt-manual-finalize-websocket">
    Using the Cartesia SDK
  </Card>
</CardGroup>