> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Hardware Selection

Cartesia's models are portable enough to run on widely available GPU hardware.

In the table below we show the recommended concurrency for our TTS and STT model workers.

| GPU        | Sonic Concurrency | Ink-2 Concurrency |
| ---------- | ----------------- | ----------------- |
| A10G       | 4                 |                   |
| L40S       | 8                 | 128               |
| A100       | 8                 |                   |
| H100 (MIG) | 8                 | 128               |
| H100       | 16                | 256               |

See [Metrics](/self-hosted/metrics) for more details on performance metrics.

## Compatibility Matrix

### Kubernetes and tooling

| Component            | Tested version          |
| -------------------- | ----------------------- |
| Kubernetes (AWS EKS) | `1.31`                  |
| Kubernetes (GCP GKE) | `1.34` (Stable channel) |

### GPU

| Component           | Value                                                          |
| ------------------- | -------------------------------------------------------------- |
| GPU architecture    | Ampere or newer (A10G, A100, L40S, H100, H200)                 |
| GPU memory          | 24 GB minimum per device                                       |
| Worker container OS | Ubuntu 22.04 LTS                                               |
| CUDA                | `12.9` — bundled in the worker image, no host install required |

### MIG (Multi-Instance GPU)

| Platform               | MIG support                                                                                      |
| ---------------------- | ------------------------------------------------------------------------------------------------ |
| GKE                    | Supported via `gpu_partition_size` on the node pool                                              |
| EKS                    | Not configured in Terraform — set up manually with the GPU Operator if needed                    |
| Docker Compose / Swarm | Supported via `--mig` flag and `nvidia-smi -L` UUIDs (see [Docker](/self-hosted/docker-compose)) |

When choosing hardware you need to consider the tradeoffs between latency (TTFA), and throughput.
See the table below for the metrics on the different set of GPUs we test on:

<Note>
  The benchmarks below are for Sonic 3.5 and require release tag `sonic-20260503` or later. *Updated April 2026.*
</Note>

<Tabs>
  <Tab title="H100">
    | Concurrency | TTFA P50 (ms) | TTFA P95 (ms) | RTF P50 | RTF P95 | Throughput (chars/s) |
    | ----------- | ------------- | ------------- | ------- | ------- | -------------------- |
    | 1           | 50            | 55            | 0.10    | 0.10    | 105                  |
    | 2           | 50            | 55            | 0.10    | 0.10    | 200                  |
    | 4           | 80            | 115           | 0.15    | 0.15    | 325                  |
    | 8           | 120           | 165           | 0.20    | 0.20    | 550                  |
    | 12          | 125           | 225           | 0.20    | 0.25    | 760                  |
    | 16          | 195           | 300           | 0.30    | 0.30    | 795                  |
  </Tab>

  <Tab title="H100 (MIG)">
    | Concurrency | TTFA P50 (ms) | TTFA P95 (ms) | RTF P50 | RTF P95 | Throughput (chars/s) |
    | ----------- | ------------- | ------------- | ------- | ------- | -------------------- |
    | 1           | 60            | 65            | 0.10    | 0.15    | 125                  |
    | 2           | 65            | 100           | 0.15    | 0.15    | 230                  |
    | 4           | 110           | 150           | 0.15    | 0.20    | 385                  |
    | 8           | 165           | 230           | 0.25    | 0.25    | 575                  |
    | 12          | 215           | 290           | 0.30    | 0.35    | 730                  |
    | 16          | 290           | 340           | 0.35    | 0.40    | 780                  |
  </Tab>

  <Tab title="L40S">
    | Concurrency | TTFA P50 (ms) | TTFA P95 (ms) | RTF P50 | RTF P95 | Throughput (chars/s) |
    | ----------- | ------------- | ------------- | ------- | ------- | -------------------- |
    | 1           | 45            | 50            | 0.10    | 0.10    | 100                  |
    | 2           | 50            | 55            | 0.15    | 0.15    | 180                  |
    | 4           | 75            | 105           | 0.15    | 0.15    | 330                  |
    | 8           | 125           | 165           | 0.20    | 0.25    | 485                  |
  </Tab>

  <Tab title="A100">
    | Concurrency | TTFA P50 (ms) | TTFA P95 (ms) | RTF P50 | RTF P95 | Throughput (chars/s) |
    | ----------- | ------------- | ------------- | ------- | ------- | -------------------- |
    | 1           | 60            | 65            | 0.15    | 0.15    | 85                   |
    | 2           | 70            | 85            | 0.15    | 0.15    | 150                  |
    | 4           | 100           | 135           | 0.20    | 0.20    | 285                  |
    | 8           | 145           | 260           | 0.25    | 0.30    | 410                  |
  </Tab>

  <Tab title="A10">
    | Concurrency | TTFA P50 (ms) | TTFA P95 (ms) | RTF P50 | RTF P95 | Throughput (chars/s) |
    | ----------- | ------------- | ------------- | ------- | ------- | -------------------- |
    | 1           | 80            | 85            | 0.15    | 0.20    | 75                   |
    | 2           | 90            | 155           | 0.20    | 0.20    | 130                  |
    | 4           | 165           | 240           | 0.25    | 0.30    | 210                  |
    | 8           | 270           | 355           | 0.40    | 0.45    | 305                  |
  </Tab>
</Tabs>

With these you'll setup your per worker configurations.  For handling your application's scaling requirements, you'll need to configure autoscaling behavior.  See [autoscaling](/self-hosted/auto-scaling) for more details.
