> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Concurrency and WebSocket Limits

> Learn about concurrency limits and timeouts with the Cartesia API.

Your account is subject to two types of rate limits: WebSocket limits and generation concurrency limits.

## Concurrency limits by subscription plan

Your subscription plan determines how many requests can be processed simultaneously. Sonic Text-to-Speech (TTS) and Ink Speech-to-Text (STT) each have separate concurrency limits with the same values per plan.

| Plan       | TTS Concurrent Requests | STT Concurrent Requests |
| ---------- | ----------------------- | ----------------------- |
| Free       | 2                       | 8                       |
| Pro        | 3                       | 12                      |
| Startup    | 5                       | 20                      |
| Scale      | 15                      | 60                      |
| Enterprise | Custom                  | Custom                  |

<Note>
  Sonic (Text-to-Speech) and Ink (Speech-to-Text) services have separate concurrent request limits. For example, if you're on the Scale plan, you can have up to 15 concurrent TTS requests AND 60 concurrent STT requests running simultaneously.
</Note>

## Text-to-Speech (TTS) Concurrency

We measure TTS generation concurrency in terms of the number of unique contexts active at a given time.

* For HTTP endpoints, each request is treated as a separate context and counts toward your concurrency limit.
* For WebSockets, a unique <code>context\_id</code> defines a context—sending additional requests with the same <code>context\_id</code> does not increase your concurrency usage. This is because requests to the same context are processed sequentially.
* STT **does not** count towards your TTS concurrency limit

If you exceed your TTS concurrency limit, you will receive a `429 Too Many Requests` error. You can check your concurrency limit and upgrade it on the playground at [play.cartesia.ai](https://play.cartesia.ai).

### Interpreting TTS concurrency limits

How you interpret your TTS concurrency limit depends on how you're using the Sonic model family.

<AccordionGroup>
  <Accordion title="Conversational use cases">
    For real-time conversational use cases, such as powering voice agents, we've found that the number of parallel conversations you can support is effectively 4X your concurrency limit. This is just a rule of thumb, and depends on the types of conversations you're supporting. You can reach out to us to discuss your specific use case.

    For example, if you have a TTS concurrency limit of 15, you can typically support 60 parallel conversations.
  </Accordion>

  <Accordion title="Non-conversational use cases">
    For non-conversational use cases, such as generating speech in batch jobs, there is a more direct relationship between your concurrency limit and the number of parallel generations you can support.

    For example, if you have a TTS concurrency limit of 15, you can typically support 15 parallel TTS generations. You can use a connection pool to ensure you don't exceed your concurrency limit.
  </Accordion>
</AccordionGroup>

### TTS WebSocket limits

We limit the number of parallel TTS WebSocket connections to 10X your concurrency limit. For example, if you have a concurrency limit of 15, you can have up to 150 parallel TTS WebSocket connections.

If you exceed your WebSocket limit, you will receive a `429 Too Many Requests` error on trying to open a new WebSocket connection.

Usually, when users run into TTS WebSocket limits (even at scale), it's because they're not properly closing idle connections. Beyond closing idle connections, you can also create a connection pool to ensure you don't exceed your WebSocket limit.

### TTS WebSocket timeouts

We close idle TTS WebSocket connections after 5 minutes. We recommend closing and re-opening a new websocket connection when connections stay idle for long periods of time.

## Speech-to-Text (STT) Concurrency

Each active transcription stream counts as one concurrent request, regardless of whether you're using HTTP or WebSocket connections.

* Each concurrent HTTP or WebSocket connection counts toward your STT concurrency limit
* Idle STT WebSockets still count towards your STT concurrency limit
* TTS **does not** count towards your STT concurrency limit

If you exceed your STT concurrency limit, you will receive a `429 Too Many Requests` error.

### STT WebSocket timeouts

We close idle STT WebSocket connections after 3 minutes. We recommend closing and re-opening a new websocket connection when connections stay idle for long periods of time.
