Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt

Use this file to discover all available pages before exploring further.

Cartesia provides Terraform configurations that deploy both infrastructure and the application, or you can deploy the Helm chart directly to an existing cluster.
Complete configurations are provided at deployment time by your Cartesia representative.

Terraform Deployment

Terraform creates the cluster, networking, GPU drivers, and deploys Cartesia via Helm. This is the fastest way for you to get started with self-hosting Cartesia.
Download cartesia-kube from the GCS bucket as described in Downloading cartesia-kube.
# Download and extract cartesia-kube from GCS (see Downloading cartesia-kube guide)
cd cartesia-kube

# Copy example config for your platform
cp aws-terraform.tfvars.example aws-terraform.tfvars  # or gcp-terraform.tfvars.example

# Deploy from the platform directory
cd infra/aws/cartesia-eks  # or infra/gcp/cartesia-gke
terraform init
terraform apply -var-file="../../../aws-terraform.tfvars" \
                -var "cartesia_api_key=$CARTESIA_API_KEY" \
                -var "service_account_json=$(cat /path/to/service-account.json)"

Configuration

region = "us-west-2"
name = "cartesia-production"

eks_admin_users = ["arn:aws:iam::123456789:user/admin"]

node_groups = {
  default = {
    ami_type = "AL2023_x86_64_STANDARD"
    instance_types = ["m7a.4xlarge"]
    min_size = 1
    max_size = 3
    desired_size = 1
  }
  gpu = {
    ami_type = "AL2023_x86_64_NVIDIA"
    instance_types = ["g5.2xlarge", "g5.4xlarge"]
    min_size = 1
    max_size = 5
    desired_size = 2
    disk_size = 100
    labels = { "nvidia.com/gpu.present" = "true" }
  }
}

# Ingress (optional)
enable_ingress = true
ingress_route = "api.cartesia.yourdomain.com"
certificate_arn = "arn:aws:acm:us-west-2:123456789:certificate/abc123"

# Hot reload (enabled by default)
enable_hot_reload = true
See Managing Artifacts for details on hot reload and adding voices and pronunciation dictionaries to your deployment.

Worker Configuration

Workers are defined in your tfvars file:
workers = [
  {
    name = "tts-worker"
    workerArgs = {
      model = "<model-name>"
      image = "cartesia-sonic-<model-name>"
      gpuType = "nvidia.com/gpu"
      capacity = 4
      operation = "TTS"
      useCB = true
      useLora = false
    }
    autoscaling = {
      enabled = true
      threshold = 0.6
      minReplicas = 1
      maxReplicas = 10
    }
  }
]
All the model workers have the images with prefix cartesia-sonic- followed by the specific model name. For instance, sonic-3 would use cartesia-sonic-rosy-dragon.

Helm-Only Deployment

For existing Kubernetes clusters, deploy the Helm chart directly.

1. Install Prerequisites

If you want autoscaling and metrics, install KEDA and Prometheus first:
# Prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

# KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace

2. Create Secrets

kubectl create namespace cartesia

kubectl create secret docker-registry gar-pull-secret \
  --namespace cartesia \
  --docker-server=us-docker.pkg.dev \
  --docker-username=_json_key \
  --docker-password="$(cat /path/to/service-account.json)"

kubectl create secret generic api-key-secret \
  --namespace cartesia \
  --from-literal=CONTAINER_KEY="$CARTESIA_API_KEY" \
  --from-literal=CARTESIA_API_KEY="$CARTESIA_API_KEY" \
  --from-literal=OPENAI_API_KEY="$OPENAI_API_KEY" \
  --from-literal=ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY"

kubectl create secret generic gcs-access-secret \
  --namespace cartesia \
  --from-file=service-account.json=/path/to/service-account.json

3. Configure values.yaml

clusterName: cartesia-production

infra:
  provider: gcp  # or aws
  authenticate: true
  imageRegistry: us-docker.pkg.dev/cartesia-external/self-serve
  imagePullSecret: gar-pull-secret
  gcsSecretName: gar-pull-secret
  serviceAccount: cartesia-image-sa

release:
  version: "1.0.0"
  releaseTag: "sonic-20251118"

filesystem:
  storageClass:
    name: standard-rwo

ingress:
  enabled: true
  routes:
    - api.cartesia.yourdomain.com
  globalStaticIpName: cartesia-ingress-ip  # GKE only

metrics:
  enabled: true

legacyComponents:
  enabled: false

workers:
  - name: tts-worker
    workerArgs:
      model: <model-name>
      image: cartesia-sonic-<model-name>
      gpuType: nvidia.com/gpu
      capacity: 4
      operation: TTS
      useCB: true
      useLora: false
    autoscaling:
      enabled: true
      threshold: "0.6"
      minReplicas: 1
      maxReplicas: 10

4. Deploy

cd cartesia-kube/cartesia
helm upgrade --install cartesia . \
  --values values.yaml \
  --namespace cartesia

Verify

Confirm the deployment is healthy before sending traffic. The commands below assume the default cartesia namespace and release name used in the examples above — substitute if you customized either.

Watch rollout

kubectl rollout status deployment/cartesia-api -n cartesia
kubectl rollout status deployment/<worker-name> -n cartesia
Worker pods take longer than the API because the model must load into GPU memory. A worker stays Running but not Ready until inferno_worker_capacity > 0.

Pods are Ready

kubectl get pods -n cartesia
All pods should be Running with every container Ready. Probe behaviors:
  • API pod becomes Ready when GET /status returns 200 on port 5000.
  • Worker pods become Ready when the startup probe is satisfied — it polls /metrics until inferno_worker_capacity reports a value greater than 0. While the model is still loading, the worker shows Running but not Ready.
  • License-proxy and NATS have no chart-defined health probes; they are Ready as soon as the container starts.

Ingress address is assigned

kubectl get ingress -n cartesia
The ADDRESS column should be populated with the load balancer’s hostname or IP. On GKE, also check the ManagedCertificate status — the chart creates the resource, and GCP provisions the certificate asynchronously after DNS validation:
kubectl describe managedcertificate cartesia-ssl-cert -n cartesia
Look for Status: Active. While GCP is still provisioning, HTTPS calls to the ingress will fail certificate validation.

Metrics scrape is working (optional)

If Prometheus is installed in the cluster (most commonly via kube-prometheus-stack), the chart’s PodMonitor (cartesia-monitor) is auto-discovered through the release: prometheus label. Port-forward the Prometheus UI and confirm each component is being scraped:
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090
Then visit http://localhost:9090 and query up{namespace="cartesia"} — every component pod should return 1. Once verified, see Smoke Tests and Benchmarking for functional smoke tests and performance benchmarks.

Ingress and TLS

The chart exposes the Cartesia API externally via a Kubernetes Ingress resource. The chart configures and annotates that Ingress for AWS EKS or GCP GKE. For other Kubernetes flavors (AKS, OpenShift, Rancher, kubeadm), disable the chart’s ingress and create your own. Select your platform below.
The chart configures the Ingress for the AWS Load Balancer Controller (ALB). Key behaviors:
  • TLS termination: TLS terminates at the ALB at minimum TLS 1.2 (ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01).
  • Backend leg: Traffic between the ALB and the API pod is plaintext HTTP (backend-protocol: HTTP). For end-to-end TLS, contact support@cartesia.ai.
  • Certificate: Pass an explicit ACM ARN via Terraform (certificate_arn) or Helm (ingress.certificateArn). If unset, the chart’s certificate-manager: 'true' annotation tells the AWS Load Balancer Controller to look up a matching ACM cert by hostname.
  • HTTP redirect: HTTP traffic on port 80 is redirected to HTTPS on port 443.
enable_ingress  = true
ingress_route   = "api.cartesia.yourdomain.com"
certificate_arn = "arn:aws:acm:us-west-2:123456789:certificate/abc123"
See cartesia/templates/resources/ingress.yaml for the full annotation set the chart applies to the Ingress.

Autoscaling

Cartesia supports two levels of autoscaling for Kubernetes deployments.

Cluster Autoscaler

Scales nodes based on pending pods. Enable in your tfvars:
enable_cluster_autoscaler = true
Node groups/pools will scale within their configured min_size/max_size bounds when pods can’t be scheduled due to insufficient resources.

Pod Autoscaler (KEDA)

Scales worker pods based on load metrics. Enable in your tfvars:
enable_pod_autoscaler = true
enable_metrics = true  # Required for KEDA
KEDA uses two scaling triggers:
  • Queue depth - Scales when unserviceable requests accumulate
  • Worker load - Scales when GPU utilization exceeds threshold

Per-Worker Scaling

Each worker can have its own scaling configuration:
workers = [
  {
    name = "tts-worker"
    workerArgs = { ... }
    autoscaling = {
      enabled = true
      threshold = 0.6      # Scale up when load > 60%
      minReplicas = 1
      maxReplicas = 10
    }
  }
]
Or in Helm values.yaml:
workers:
  - name: tts-worker
    workerArgs: { ... }
    autoscaling:
      enabled: true
      threshold: "0.6"
      minReplicas: 1
      maxReplicas: 10

Scaling Behavior

  • Scale up: 30 second stabilization window
  • Scale down: 900 second (15 min) stabilization window to avoid flapping
  • Workers scale independently based on their individual load

Go-live Checklist

Final review before opening to production traffic: