Documentation Index
Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
Use this file to discover all available pages before exploring further.
Cartesia’s models are portable enough to run on widely available GPU hardware.
In the table below we show the recommended concurrency for our TTS and STT model workers.
| GPU | Sonic Concurrency | Ink Concurrency |
|---|
| A10G | 4 | |
| L40S | 4 | |
| A100 | 4 | |
| H100 | 8 | 16 |
See Metrics for more details on performance metrics.
When choosing hardware you need to consider the tradeoffs between latency (TTFA), and throughput.
See the table below for the metrics on the different set of GPUs we test on:
| Concurrency | TTFA (ms) | RTF Avg | RTF P95 | Throughput (chars/s) |
|---|
| 1 | 95 | 0.20 | 0.25 | 30 |
| 2 | 115 | 0.25 | 0.35 | 50 |
| 4 | 165 | 0.30 | 0.55 | 90 |
| 8 | 280 | 0.40 | 0.70 | 165 |
| Concurrency | Model TTFA (ms) | Model RTF Avg | Model RTF P95 | Throughput (chars/s) |
|---|
| 1 | 90 | 0.20 | 0.20 | 50 |
| 2 | 120 | 0.25 | 0.25 | 90 |
| 4 | 180 | 0.30 | 0.45 | 145 |
| 8 | 185 | 0.30 | 0.55 | 180 |
| Concurrency | Model TTFA (ms) | Model RTF Avg | Model RTF P95 | Throughput (chars/s) |
|---|
| 1 | 130 | 0.30 | 0.30 | 45 |
| 2 | 180 | 0.30 | 0.35 | 70 |
| 4 | 280 | 0.40 | 0.40 | 120 |
| 8 | 260 | 0.40 | 0.60 | 135 |
| Concurrency | Model TTFA (ms) | Model RTF Avg | Model RTF P95 | Throughput (chars/s) |
|---|
| 1 | 140 | 0.30 | 0.30 | 40 |
| 2 | 205 | 0.35 | 0.35 | 60 |
| 4 | 335 | 0.45 | 0.50 | 100 |
| 8 | 600 | 0.65 | 0.70 | 155 |
With these you’ll setup your per worker configurations. For handling your application’s scaling requirements, you’ll need to configure autoscaling behavior. See autoscaling for more details.