Architecture

Cartesia’s self-hosted services support a configurable trade-off between latency and throughput for both TTS and STT deployments.

Core Components

API Server

The API Server is the entrypoint for all requests for your self-hosted Cartesia Service. It handles incoming REST API requests and WebSocket connections.

PubSub Controller (NATS)

We leverage an async communication protocol between the API server and the model containers to manage smooth low latency request handling. This design allows :

Model containers to leave and join the cluster freely.
Efficient stateful management of long running request lifecycles.
Coordination between the API server and Model containers for the lowest latency pathways for a request.

Model Workers (Engine)

Cartesia provides batched engine workers for both TTS and STT. The core parameter to customize here is the batch_size (B). We’ll discuss tradeoffs for this and other parameters in the Performance Tuning sections.

License Proxy Server

We deploy a single service which talks to our cloud environment for authenticating and ensuring license validity of the self-hosted deployment. We do this for several reasons, primarily: this becomes the only service making outbound calls, thus making it easier to configure network security policies. Proxy allows you to choose the level of isolation you want:

Connected: The deployment validates licensing by pinging our cloud periodically and sends telemetry regarding usage.
Air-gapped: Completely isolated offering, where you work with an offline license. In air-gapped mode, we work with you directly to get usage information via audit-logs.

For most customers, we recommend deploying in Connected mode, however if you have need for completely isolated deployments, our GTM team can work with you in setting things up. For both Connected and Air-gapped mode, we have grace periods configured, so we don’t immediately terminate the operations on getting disconnected or license expiring.

Network Topology and Ports

The values below are sourced from the Helm chart shipped in cartesia-kube.

Inter-service ports

Component	Port	Protocol
API server	`5000`	HTTP / WebSocket
API metrics	`8080`	HTTP (Prometheus)
NATS	`4222`	NATS protocol (TCP)
NATS monitoring	`8222`	HTTP
License proxy	`8080`	HTTP
Worker metrics	`8080`	HTTP (Prometheus)

Service-to-service traffic

From	To	Purpose
Client	Ingress → API `5000`	REST and WebSocket requests
API	NATS `4222`	Publishes TTS and STT jobs
API	License proxy `8080` (`LICENSE_PROXY_URL`)	Per-request authorization check
Workers	NATS `4222`	Subscribe to inference jobs
License proxy	Cartesia license endpoint	Auth and audit log forwarding (air-gapped mode performs both locally)
API and Workers	Google Cloud Storage	Voice, LoRA, and migration sync
Prometheus	API `8080`, Worker `8080`	Metrics scrape every 5 seconds

Outbound egress (Connected mode)

In connected mode, the cluster needs outbound access to these destinations. Allowlist them in your firewall:

Destination	Port	Reason
`us-docker.pkg.dev`	443	Container image pulls on pod start
`storage.googleapis.com`	443	Voice, LoRA, and migration sync from GCS
`api.cartesia.ai`	443	License-proxy authentication (connected mode)
DNS	53 (UDP + TCP)	Standard resolution

In air-gapped mode, none of these egress paths are required — the cluster operates entirely offline against a locally-issued license file.

Restricting cluster egress

The Helm chart ships an opt-in NetworkPolicy that blocks all non-cluster egress except DNS, RFC1918 private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), and the cloud-metadata IP (169.254.169.254/32). Enable in values.yaml:

networkPolicy:
  denyExternalEgress: true

Use this when you mediate all internet egress through a separate gateway, or in air-gapped deployments.

Traffic between services

All inter-service communication inside the cluster is unencrypted. Access to the cluster network is restricted by your cloud provider’s networking and any NetworkPolicy resources you apply. The chart ships an opt-in policy that blocks all non-cluster egress (see Restricting cluster egress above). For ingress TLS configuration (ACM on EKS, Managed Certificates on GKE, BYO certs on self-managed clusters), see Managed Kubernetes.

Overview

Deployments

Guides

Performance

Core Components

API Server

PubSub Controller (NATS)

Model Workers (Engine)

License Proxy Server

Network Topology and Ports

Inter-service ports

Service-to-service traffic

Outbound egress (Connected mode)

Restricting cluster egress

Traffic between services

Overview

Deployments

Guides

Performance

Documentation Index

​Core Components

​API Server

​PubSub Controller (NATS)

​Model Workers (Engine)

​License Proxy Server

​Network Topology and Ports

​Inter-service ports

​Service-to-service traffic

​Outbound egress (Connected mode)

​Restricting cluster egress

​Traffic between services

Core Components

API Server

PubSub Controller (NATS)

Model Workers (Engine)

License Proxy Server

Network Topology and Ports

Inter-service ports

Service-to-service traffic

Outbound egress (Connected mode)

Restricting cluster egress

Traffic between services