> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Realtime Text-to-Speech Quickstart

> Stream text to Cartesia over a WebSocket and receive audio in real time.

Using the Cartesia Websocket API allows you to simultaneously stream text input and audio output.  This is best for realtime use cases such as voice agents when text is generated incrementally, as from an LLM.

Stream text in chunks to the Cartesia and receive audio chunks in real time. This is ideal when text is generated incrementally, such as from an LLM.

## Prerequisites

* A Cartesia API key. [Create one here](https://play.cartesia.ai/keys), then add it to your `.bashrc` or `.zshrc`:

  ```sh theme={null}
  export CARTESIA_API_KEY=<your api key here>
  ```

  <Info>
    When accessing the Cartesia API from a browser, please use ephemeral access tokens for authentication to keep your API Key safe. See [Authenticate Your Client Applications](/get-started/authenticate-your-client-applications).
  </Info>

* `ffplay` (part of FFmpeg), used to play audio output:

  Download the FFmpeg executable package for your operating system from the [FFmpeg download page](https://ffmpeg.org/download.html).

* A language runtime and package manager:
  * **Python**: [Python 3](https://www.python.org/downloads/) and [pip](https://pip.pypa.io/en/stable/installation/)
  * **JavaScript**: [Node.js](https://nodejs.org/) (includes npm)

## Stream text and play audio

<Tabs>
  <Tab title="Python">
    <Steps>
      <Step title="Install the client library">
        ```sh theme={null}
        pip install 'cartesia[websockets]'
        ```

        See the [Cartesia Python client library](/tools/client-libraries) for more details.
      </Step>

      <Step title="Stream text over a WebSocket">
        ```python realtime-tts.py theme={null}
        from cartesia import Cartesia
        import subprocess
        import os

        client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY"))

        print("Starting ffplay to play streaming audio output...")
        player = subprocess.Popen(
            ["ffplay", "-f", "f32le", "-ar", "44100", "-probesize", "32", "-analyzeduration", "0", "-nodisp", "-autoexit", "-loglevel", "quiet", "-"],
            stdin=subprocess.PIPE,
            bufsize=0,
        )

        print("Connecting to Cartesia via websockets...")
        with client.tts.websocket_connect() as connection:
            ctx = connection.context(
                model_id="sonic-3.5",
                voice={"mode": "id", "id": "f786b574-daa5-4673-aa0c-cbe3e8534c02"},
                output_format={
                    "container": "raw",
                    "encoding": "pcm_f32le",
                    "sample_rate": 44100,
                },
            )

            print("Sending chunked text input...")
            transcript_chunks = ["Hi there! ", "Welcome to ", "Cartesia Sonic."]
            for part in transcript_chunks:
                ctx.push(part)

            ctx.no_more_inputs()

            for response in ctx.receive():
                if response.type == "chunk" and response.audio:
                    print(f"Received audio chunk ({len(response.audio)} bytes)")
                    # Here we pipe audio to ffplay. In a production app you might play audio in
                    # a client, or forward it to another service, eg, a telephony provider.
                    player.stdin.write(response.audio)
                elif response.type == "done":
                    break

        player.stdin.close()
        player.wait()
        ```
      </Step>

      <Step title="Run the quickstart">
        ```sh theme={null}
        python3 realtime-tts.py
        ```

        This will stream text inputs to Cartesia, and play the streaming audio output using `ffplay`. (Make sure your device volume is turned on!)
      </Step>
    </Steps>
  </Tab>

  <Tab title="TypeScript">
    <Steps>
      <Step title="Install the client library">
        ```sh theme={null}
        npm init -y
        npm pkg set type=module
        npm install @cartesia/cartesia-js ws
        npm install --save-dev tsx typescript @types/node
        ```

        <Info>
          You only need the `ws` package in NodeJS runtimes.  In a browser, don't install the `ws` package — the native WebSocket API is sufficient.
        </Info>

        See the [Cartesia JavaScript client library](/tools/client-libraries) for more details.
      </Step>

      <Step title="Stream text over a WebSocket">
        Create a file named `realtime-tts.ts` with the following code. Code explanation is further below.

        ```typescript realtime-tts.ts theme={null}
        import Cartesia from "@cartesia/cartesia-js";
        import { spawn } from "child_process";

        const apiKey = process.env["CARTESIA_API_KEY"];
        if (!apiKey) {
          throw new Error("Missing CARTESIA_API_KEY");
        }

        const client = new Cartesia({ apiKey });

        console.log("Starting ffplay to play streaming audio output...");
        const { stdin } = spawn("ffplay", ["-f", "f32le", "-ar", "44100", "-probesize", "32", "-analyzeduration", "0", "-nodisp", "-autoexit", "-loglevel", "quiet", "-"], {
          stdio: ["pipe", "ignore", "ignore"],
        });
        if (!stdin) {
          throw new Error("ffplay stdin not available");
        }

        console.log("Connecting to Cartesia via websockets...");
        const ws = await client.tts.websocket();

        const ctx = ws.context({
          model_id: "sonic-3.5",
          voice: { mode: "id", id: "f786b574-daa5-4673-aa0c-cbe3e8534c02" },
          output_format: { container: "raw", encoding: "pcm_f32le", sample_rate: 44100 },
        });

        console.log("Sending chunked text input...");
        const transcriptChunks = ["Hi there! ", "Welcome to ", "Cartesia Sonic."];
        for (const part of transcriptChunks) {
          await ctx.push({ transcript: part });
        }

        await ctx.no_more_inputs();

        for await (const event of ctx.receive()) {
          if (event.type === "chunk" && event.audio) {
            console.log(`Received audio chunk (${event.audio.length} bytes)`);
            // Here we pipe audio to ffplay. In a production app you might play audio in
            // a client, or forward it to another service, eg, a telephony provider.
            stdin.write(event.audio);
          } else if (event.type === "done") {
            break;
          }
        }
        stdin.end();
        ws.close();
        ```
      </Step>

      <Step title="Run the quickstart">
        ```sh theme={null}
        npx tsx realtime-tts.ts
        ```

        This will stream text inputs to Cartesia, and play the streaming audio output using `ffplay`. (Make sure your device volume is turned on!)
      </Step>
    </Steps>
  </Tab>
</Tabs>

## How it works

The WebSocket connection can manage multiple *contexts* where each context is a full-duplex, continuous stream.  You push text chunks in and receive generated audio chunks out in real time.

This works well when generating text from an LLM in real time: Cartesia's TTS system maintains context history and appends each new chunk to it. This keeps generated speech continuous and consistent in tone and prosody while minimizing latency since you don't have to wait for the full transcript to be ready.

To summarize, here's what our code does after establishing a WebSocket connection:

1. **Create a context** with `context()`.
2. **Push text** incrementally with `push()`. Each call sends the chunk with `continue: true`, telling the model more text will follow. See [continuations](/build-with-cartesia/capability-guides/stream-inputs-using-continuations) for details.
3. **Signal completion** with `no_more_inputs()`, which sends `continue: false` to tell the model no more text is coming.
4. **Receive audio** chunks as they are generated.

This uses a similar streaming pattern to realtime LLM WebSocket APIs: send text fragments as they arrive, and receive generated output incrementally — audio chunks in this case.

## What's next

<CardGroup cols={3}>
  <Card title="Pick a voice in Playground" icon="microphone" href="https://play.cartesia.ai/voices">
    Choose a voice, or clone your own, then copy the voice ID back into this quickstart.
  </Card>

  <Card title="Tune WebSocket request params" icon="sliders" href="/api-reference/tts/websocket">
    Change `voice`, `model_id`, or `output_format`, then rerun and compare output quality and behavior.
  </Card>

  <Card title="Stream inputs using continuations" icon="waveform" href="/build-with-cartesia/capability-guides/stream-inputs-using-continuations">
    Send incremental text while preserving flow across chunks for smoother long-form or LLM-driven speech.
  </Card>
</CardGroup>
