Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt

Use this file to discover all available pages before exploring further.

Cartesia’s self-hosted services support a configurable trade-off between latency and throughput for both TTS and STT deployments.
Self-hosted Architecture

Core Components

API Server

The API Server is the entrypoint for all requests for your self-hosted Cartesia Service. It handles incoming REST API requests and WebSocket connections.

PubSub Controller (NATS)

We leverage an async communication protocol between the API server and the model containers to manage smooth low latency request handling. This design allows :
  • Model containers to leave and join the cluster freely.
  • Efficient stateful management of long running request lifecycles.
  • Coordination between the API server and Model containers for the lowest latency pathways for a request.

Model Workers (Engine)

Cartesia provides batched engine workers for both TTS and STT. The core parameter to customize here is the batch_size (B). We’ll discuss tradeoffs for this and other parameters in the Performance Tuning sections.

License Proxy Server

We deploy a single service which talks to our cloud environment for authenticating and ensuring license validity of the self-hosted deployment. We do this for several reasons, primarily: this becomes the only service making outbound calls, thus making it easier to configure network security policies. Proxy allows you to choose the level of isolation you want:
  • Connected: The deployment validates licensing by pinging our cloud periodically and sends telemetry regarding usage.
  • Air-gapped: Completely isolated offering, where you work with an offline license. In air-gapped mode, we work with you directly to get usage information via audit-logs.
For most customers, we recommend deploying in Connected mode, however if you have need for completely isolated deployments, our GTM team can work with you in setting things up. For both Connected and Air-gapped mode, we have grace periods configured, so we don’t immediately terminate the operations on getting disconnected or license expiring.