> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Concurrency and WebSocket Limits

> Learn about concurrency limits and timeouts with the Cartesia API.

Your account is subject to two types of rate limits: WebSocket limits and generation concurrency limits.

## Concurrency limits by subscription plan

Your subscription plan determines how many requests can be processed simultaneously. Sonic Text-to-Speech (TTS) and Ink Speech-to-Text (STT) each have separate concurrency limits with the same values per plan.

| Plan       | TTS Concurrent Requests | STT Concurrent Requests |
| ---------- | ----------------------- | ----------------------- |
| Free       | 2                       | 8                       |
| Pro        | 3                       | 12                      |
| Startup    | 5                       | 20                      |
| Scale      | 15                      | 60                      |
| Enterprise | Custom                  | Custom                  |

<Note>
  Sonic (Text-to-Speech) and Ink (Speech-to-Text) services have separate concurrent request limits. For example, if you're on the Scale plan, you can have up to 15 concurrent TTS requests AND 60 concurrent STT requests running simultaneously.

  Line voice agents have separate [agent-call concurrency limits](/line/infrastructure/scaling).
</Note>

## Text-to-Speech (TTS) Concurrency

We measure TTS generation concurrency in terms of the number of unique contexts active at a given time.

* For HTTP endpoints, each request is treated as a separate context and counts toward your concurrency limit.
* For WebSockets, a unique <code>context\_id</code> defines a context—sending additional requests with the same <code>context\_id</code> does not increase your concurrency usage. This is because requests to the same context are processed sequentially.
* STT **does not** count towards your TTS concurrency limit

If you exceed your TTS concurrency limit, you will receive a `429 Too Many Requests` error. You can check your concurrency limit and upgrade it on the playground at [play.cartesia.ai](https://play.cartesia.ai).

### Interpreting TTS concurrency limits

How you interpret your TTS concurrency limit depends on how you're using the Sonic model family.

<AccordionGroup>
  <Accordion title="Conversational use cases">
    For real-time conversational use cases, such as powering voice agents, we've found that the number of parallel conversations you can support is effectively 4X your concurrency limit. This is just a rule of thumb, and depends on the types of conversations you're supporting. You can reach out to us to discuss your specific use case.

    For example, if you have a TTS concurrency limit of 15, you can typically support 60 parallel conversations.
  </Accordion>

  <Accordion title="Non-conversational use cases">
    For non-conversational use cases, such as generating speech in batch jobs, there is a more direct relationship between your concurrency limit and the number of parallel generations you can support.

    For example, if you have a TTS concurrency limit of 15, you can typically support 15 parallel TTS generations. You can use a connection pool to ensure you don't exceed your concurrency limit.
  </Accordion>
</AccordionGroup>

### TTS WebSocket limits

We limit the number of parallel TTS WebSocket connections to 10X your concurrency limit. For example, if you have a concurrency limit of 15, you can have up to 150 parallel TTS WebSocket connections.

If you exceed your WebSocket limit, you will receive a `429 Too Many Requests` error on trying to open a new WebSocket connection.

Usually, when users run into TTS WebSocket limits (even at scale), it's because they're not properly closing idle connections. Beyond closing idle connections, you can also create a connection pool to ensure you don't exceed your WebSocket limit.

### TTS WebSocket timeouts

We close idle TTS WebSocket connections after 5 minutes. We recommend closing and re-opening a new websocket connection when connections stay idle for long periods of time.

## Speech-to-Text (STT) Concurrency

Each active transcription stream counts as one concurrent request, regardless of whether you're using HTTP or WebSocket connections.

* Each concurrent HTTP or WebSocket connection counts toward your STT concurrency limit
* Idle STT WebSockets still count towards your STT concurrency limit
* TTS **does not** count towards your STT concurrency limit

If you exceed your STT concurrency limit, you will receive a `429 Too Many Requests` error.

### STT WebSocket timeouts

We close idle STT WebSocket connections after 3 minutes. We recommend closing and re-opening a new websocket connection when connections stay idle for long periods of time.