Documentation Index
Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
Use this file to discover all available pages before exploring further.
Cartesia’s models are portable enough to run on widely available GPU hardware.
In the table below we show the recommended concurrency for our TTS and STT model workers.
| GPU | Sonic Concurrency | Ink-2 Concurrency |
|---|
| A10G | 4 | |
| L40S | 8 | 128 |
| A100 | 8 | |
| H100 (MIG) | 8 | 128 |
| H100 | 16 | 256 |
See Metrics for more details on performance metrics.
Compatibility Matrix
| Component | Tested version |
|---|
| Kubernetes (AWS EKS) | 1.31 |
| Kubernetes (GCP GKE) | 1.34 (Stable channel) |
GPU
| Component | Value |
|---|
| GPU architecture | Ampere or newer (A10G, A100, L40S, H100, H200) |
| GPU memory | 24 GB minimum per device |
| Worker container OS | Ubuntu 22.04 LTS |
| CUDA | 12.9 — bundled in the worker image, no host install required |
MIG (Multi-Instance GPU)
| Platform | MIG support |
|---|
| GKE | Supported via gpu_partition_size on the node pool |
| EKS | Not configured in Terraform — set up manually with the GPU Operator if needed |
| Docker Compose / Swarm | Supported via --mig flag and nvidia-smi -L UUIDs (see Docker) |
When choosing hardware you need to consider the tradeoffs between latency (TTFA), and throughput.
See the table below for the metrics on the different set of GPUs we test on:
The benchmarks below are for Sonic 3.5 and require release tag sonic-20260503 or later. Updated April 2026.
H100
H100 (MIG)
L40S
A100
A10
| Concurrency | TTFA P50 (ms) | TTFA P95 (ms) | RTF P50 | RTF P95 | Throughput (chars/s) |
|---|
| 1 | 50 | 55 | 0.10 | 0.10 | 105 |
| 2 | 50 | 55 | 0.10 | 0.10 | 200 |
| 4 | 80 | 115 | 0.15 | 0.15 | 325 |
| 8 | 120 | 165 | 0.20 | 0.20 | 550 |
| 12 | 125 | 225 | 0.20 | 0.25 | 760 |
| 16 | 195 | 300 | 0.30 | 0.30 | 795 |
| Concurrency | TTFA P50 (ms) | TTFA P95 (ms) | RTF P50 | RTF P95 | Throughput (chars/s) |
|---|
| 1 | 60 | 65 | 0.10 | 0.15 | 125 |
| 2 | 65 | 100 | 0.15 | 0.15 | 230 |
| 4 | 110 | 150 | 0.15 | 0.20 | 385 |
| 8 | 165 | 230 | 0.25 | 0.25 | 575 |
| 12 | 215 | 290 | 0.30 | 0.35 | 730 |
| 16 | 290 | 340 | 0.35 | 0.40 | 780 |
| Concurrency | TTFA P50 (ms) | TTFA P95 (ms) | RTF P50 | RTF P95 | Throughput (chars/s) |
|---|
| 1 | 45 | 50 | 0.10 | 0.10 | 100 |
| 2 | 50 | 55 | 0.15 | 0.15 | 180 |
| 4 | 75 | 105 | 0.15 | 0.15 | 330 |
| 8 | 125 | 165 | 0.20 | 0.25 | 485 |
| Concurrency | TTFA P50 (ms) | TTFA P95 (ms) | RTF P50 | RTF P95 | Throughput (chars/s) |
|---|
| 1 | 60 | 65 | 0.15 | 0.15 | 85 |
| 2 | 70 | 85 | 0.15 | 0.15 | 150 |
| 4 | 100 | 135 | 0.20 | 0.20 | 285 |
| 8 | 145 | 260 | 0.25 | 0.30 | 410 |
| Concurrency | TTFA P50 (ms) | TTFA P95 (ms) | RTF P50 | RTF P95 | Throughput (chars/s) |
|---|
| 1 | 80 | 85 | 0.15 | 0.20 | 75 |
| 2 | 90 | 155 | 0.20 | 0.20 | 130 |
| 4 | 165 | 240 | 0.25 | 0.30 | 210 |
| 8 | 270 | 355 | 0.40 | 0.45 | 305 |
With these you’ll setup your per worker configurations. For handling your application’s scaling requirements, you’ll need to configure autoscaling behavior. See autoscaling for more details.