> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Compare Endpoints

> There are 3 ways to integrate with Cartesia's speech-to-text API

## Overview

### Realtime STT (Auto)

Most new voice agents should start with [Realtime STT (Auto) `/stt/turns/websocket`](/api-reference/stt/turns/websocket) to take advantage of built-in turn detection.

> A **user turn** is one stretch of user speech that your app treats as a single response point.\
> We refer to our `/stt/turns/websocket` endpoint as "Realtime STT (**Auto**)" since user turns are **automatically finalized** by our model.

### Realtime STT (Manual)

Cartesia also supports [Realtime STT (Manual) `/stt/websocket`](/api-reference/stt/websocket) for stacks that already manage VAD themselves and want tight control over when transcripts are emitted.
Send `"finalize"` whenever the user stops speaking.

> **Voice activity detection (VAD)** detects speech versus non-speech in audio.\
> We refer to our `/stt/websocket` endpoint as "Realtime STT (**Manual**)" since user turns are **manually finalized** by your own VAD.

### Batch STT

Use [Batch STT `/stt`](/api-reference/stt/transcribe) to transcribe pre-recorded audio in a single request.

> Batch STT accepts the entire recording in a single request while realtime endpoints can only accept one second of audio data per second, i.e. audio needs to be sent "in real time".

## Comparison

|                                           | `/stt/turns/websocket` (auto)                       | `/stt/websocket` (manual)                      | `/stt` (batch)                          |
| ----------------------------------------- | --------------------------------------------------- | ---------------------------------------------- | --------------------------------------- |
| Transport                                 | WebSocket                                           | WebSocket                                      | HTTP file upload                        |
| Best for                                  | Natural back-and-forth voice agents                 | Explicit turn control                          | Pre-recorded files and offline jobs     |
| Supported models                          | `ink-2` only                                        | All                                            | `ink-whisper` only; `ink-2` coming soon |
| Who handles VAD?                          | Cartesia                                            | Your app                                       | N/A                                     |
| Who decides when a user turn is complete? | Cartesia                                            | Your app                                       | N/A                                     |
| Do you send `finalize`?                   | No                                                  | Yes. This is **crucial** to ensure low latency | No                                      |
| Audio input                               | Chunked stream                                      | Chunked stream                                 | Complete file                           |
| What comes back?                          | Turn events with **complete user turn transcripts** | **Transcript deltas** as they become available | One complete transcript                 |

<Note>
  Ink 2 only supports English right now.\
  We expect to add more languages in the coming months.
</Note>

## How to decide

If you are building a voice agent, start with [Realtime STT (Auto) `/stt/turns/websocket`](/api-reference/stt/turns/websocket).

If your app already knows exactly when to start and stop transcription, or you want tight control over when transcripts are emitted, use [Realtime STT (Manual) `/stt/websocket`](/api-reference/stt/websocket)
and send `"finalize"` whenever the user stops speaking.

If you are transcribing audio that is already fully recorded, use [Batch STT `/stt`](/api-reference/stt/transcribe).

## Where to go next

<CardGroup cols={3}>
  <Card title="Understand turn detection" icon="comments" href="/use-the-api/stt/turns/turns">
    See how user turn events work in voice agents
  </Card>

  <Card title="Avoid these pitfalls" icon="bug" href="/use-the-api/stt/troubleshooting">
    Troubleshoot transcription errors, high latency, and server errors
  </Card>

  <Card title="Check out some code examples" icon="brackets-curly" href="/examples/stt-auto-finalize-websocket">
    Simple implementations using each API endpoint
  </Card>
</CardGroup>
