Concurrency Limits and Timeouts
Learn about concurrency limits and timeouts with the Cartesia API.
Your account is subject to two types of rate limits: WebSocket limits and generation concurrency limits.
WebSocket limits
You don’t need to worry about WebSocket limits if you’re only using the HTTP API.
We limit the number of parallel WebSocket connections to 10X your concurrency limit. For example, if you have a concurrency limit of 15, you can have up to 150 parallel WebSocket connections.
If you exceed your WebSocket limit, you will receive a 429 Too Many Requests
error on trying to open a new WebSocket connection.
Usually, when users run into WebSocket limits (even at scale), it’s because they’re not properly closing idle connections. Beyond closing idle connections, you can also create a connection pool to ensure you don’t exceed your WebSocket limit.
Concurrency limits
We measure generation concurrency in terms of the number of unique contexts active at a given time. For regular HTTP endpoints, each HTTP request corresponds to a context. For WebSockets, each context_id
corresponds to a unique context; additional messages you send with the same context_id
will not count against your concurrency limit.
If you exceed your concurrency limit, you will receive a 429 Too Many Requests
error. You can check your concurrency limit and upgrade it on the playground at play.cartesia.ai.
Interpreting concurrency limits
How you interpret your concurrency limit depends on how you’re using the Sonic model family.
Conversational use cases
For real-time conversational use cases, such as powering voice agents, we’ve found that the number of parallel conversations you can support is effectively 4X your concurrency limit. This is just a rule of thumb, and depends on the types of conversations you’re supporting. You can reach out to us to discuss your specific use case.
For example, if you have a concurrency limit of 15, you can typically support 60 parallel conversations.
Non-conversational use cases
For non-conversational use cases, such as generating speech in batch jobs, there is a more direct relationship between your concurrency limit and the number of parallel generations you can support.
For example, if you have a concurrency limit of 15, you can typically support 15 parallel generations. You can use a connection pool to ensure you don’t exceed your concurrency limit.
WebSocket timeouts
We close idle WebSocket connections after 5 minutes. If you need to keep a connection open for longer, you can periodically do a generation to keep the connection from idling.