Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt

Use this file to discover all available pages before exploring further.

Cartesia’s models are portable enough to run on widely available GPU hardware. In the table below we show the recommended concurrency for our TTS and STT model workers.
GPUSonic ConcurrencyInk Concurrency
A10G4
L40S4
A1004
H100816
See Metrics for more details on performance metrics. When choosing hardware you need to consider the tradeoffs between latency (TTFA), and throughput. See the table below for the metrics on the different set of GPUs we test on:
ConcurrencyTTFA (ms)RTF AvgRTF P95Throughput (chars/s)
1950.200.2530
21150.250.3550
41650.300.5590
82800.400.70165
With these you’ll setup your per worker configurations. For handling your application’s scaling requirements, you’ll need to configure autoscaling behavior. See autoscaling for more details.