Managed Kubernetes

Cartesia provides Terraform configurations that deploy both infrastructure and the application, or you can deploy the Helm chart directly to an existing cluster.

Complete configurations are provided at deployment time by your Cartesia representative.

Terraform Deployment

Terraform creates the cluster, networking, GPU drivers, and deploys Cartesia via Helm. This is the fastest way for you to get started with self-hosting Cartesia.

Download cartesia-kube from the GCS bucket as described in Downloading cartesia-kube.

# Download and extract cartesia-kube from GCS (see Downloading cartesia-kube guide)
cd cartesia-kube

# Copy example config for your platform
cp aws-terraform.tfvars.example aws-terraform.tfvars  # or gcp-terraform.tfvars.example

# Deploy from the platform directory
cd infra/aws/cartesia-eks  # or infra/gcp/cartesia-gke
terraform init
terraform apply -var-file="../../../aws-terraform.tfvars" \
                -var "cartesia_api_key=$CARTESIA_API_KEY" \
                -var "service_account_json=$(cat /path/to/service-account.json)"

Configuration

AWS EKS
GCP GKE

region = "us-west-2"
name = "cartesia-production"

eks_admin_users = ["arn:aws:iam::123456789:user/admin"]

node_groups = {
  default = {
    ami_type = "AL2023_x86_64_STANDARD"
    instance_types = ["m7a.4xlarge"]
    min_size = 1
    max_size = 3
    desired_size = 1
  }
  gpu = {
    ami_type = "AL2023_x86_64_NVIDIA"
    instance_types = ["g5.2xlarge", "g5.4xlarge"]
    min_size = 1
    max_size = 5
    desired_size = 2
    disk_size = 100
    labels = { "nvidia.com/gpu.present" = "true" }
  }
}

# Ingress (optional)
enable_ingress = true
ingress_route = "api.cartesia.yourdomain.com"
certificate_arn = "arn:aws:acm:us-west-2:123456789:certificate/abc123"

# Hot reload (enabled by default)
enable_hot_reload = true

project_id = "your-gcp-project"
region = "us-central1"
zone = "us-central1-a"
name = "cartesia-production"

gke_admin_users = ["user@yourdomain.com"]

node_pools = {
  default = {
    machine_type = "e2-standard-8"
    min_count = 1
    max_count = 3
    initial_node_count = 1
  }
  gpu = {
    machine_type = "g2-standard-8"
    accelerator_type = "nvidia-l4"
    accelerator_count = 1
    min_count = 1
    max_count = 5
    initial_node_count = 2
    disk_size_gb = 100
  }
}

# Ingress (optional)
enable_ingress = true
ingress_route = "api.cartesia.yourdomain.com"

# Hot reload (enabled by default)
enable_hot_reload = true

See Managing Artifacts for details on hot reload and adding voices and pronunciation dictionaries to your deployment.

Worker Configuration

Workers are defined in your tfvars file:

workers = [
  {
    name = "tts-worker"
    workerArgs = {
      model = "<model-name>"
      image = "cartesia-sonic-<model-name>"
      gpuType = "nvidia.com/gpu"
      capacity = 4
      operation = "TTS"
      useCB = true
      useLora = false
    }
    autoscaling = {
      enabled = true
      threshold = 0.6
      minReplicas = 1
      maxReplicas = 10
    }
  }
]

All the model workers have the images with prefix cartesia-sonic- followed by the specific model name. For instance, sonic-3 would use cartesia-sonic-rosy-dragon.

Helm-Only Deployment

For existing Kubernetes clusters, deploy the Helm chart directly.

1. Install Prerequisites

If you want autoscaling and metrics, install KEDA and Prometheus first:

# Prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

# KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace

2. Create Secrets

kubectl create namespace cartesia

kubectl create secret docker-registry gar-pull-secret \
  --namespace cartesia \
  --docker-server=us-docker.pkg.dev \
  --docker-username=_json_key \
  --docker-password="$(cat /path/to/service-account.json)"

kubectl create secret generic api-key-secret \
  --namespace cartesia \
  --from-literal=CONTAINER_KEY="$CARTESIA_API_KEY" \
  --from-literal=CARTESIA_API_KEY="$CARTESIA_API_KEY" \
  --from-literal=OPENAI_API_KEY="$OPENAI_API_KEY" \
  --from-literal=ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY"

kubectl create secret generic gcs-access-secret \
  --namespace cartesia \
  --from-file=service-account.json=/path/to/service-account.json

3. Configure values.yaml

clusterName: cartesia-production

infra:
  provider: gcp  # or aws
  authenticate: true
  imageRegistry: us-docker.pkg.dev/cartesia-external/self-serve
  imagePullSecret: gar-pull-secret
  gcsSecretName: gar-pull-secret
  serviceAccount: cartesia-image-sa

release:
  version: "1.0.0"
  releaseTag: "sonic-20251118"

filesystem:
  storageClass:
    name: standard-rwo

ingress:
  enabled: true
  routes:
    - api.cartesia.yourdomain.com
  globalStaticIpName: cartesia-ingress-ip  # GKE only

metrics:
  enabled: true

legacyComponents:
  enabled: false

workers:
  - name: tts-worker
    workerArgs:
      model: <model-name>
      image: cartesia-sonic-<model-name>
      gpuType: nvidia.com/gpu
      capacity: 4
      operation: TTS
      useCB: true
      useLora: false
    autoscaling:
      enabled: true
      threshold: "0.6"
      minReplicas: 1
      maxReplicas: 10

4. Deploy

cd cartesia-kube/cartesia
helm upgrade --install cartesia . \
  --values values.yaml \
  --namespace cartesia

Verify

Confirm the deployment is healthy before sending traffic. The commands below assume the default cartesia namespace and release name used in the examples above — substitute if you customized either.

Watch rollout

kubectl rollout status deployment/cartesia-api -n cartesia
kubectl rollout status deployment/<worker-name> -n cartesia

Worker pods take longer than the API because the model must load into GPU memory. A worker stays Running but not Ready until inferno_worker_capacity > 0.

Pods are Ready

kubectl get pods -n cartesia

All pods should be Running with every container Ready. Probe behaviors:

API pod becomes Ready when GET /status returns 200 on port 5000.
Worker pods become Ready when the startup probe is satisfied — it polls /metrics until inferno_worker_capacity reports a value greater than 0. While the model is still loading, the worker shows Running but not Ready.
License-proxy and NATS have no chart-defined health probes; they are Ready as soon as the container starts.

Ingress address is assigned

kubectl get ingress -n cartesia

The ADDRESS column should be populated with the load balancer’s hostname or IP. On GKE, also check the ManagedCertificate status — the chart creates the resource, and GCP provisions the certificate asynchronously after DNS validation:

kubectl describe managedcertificate cartesia-ssl-cert -n cartesia

Look for Status: Active. While GCP is still provisioning, HTTPS calls to the ingress will fail certificate validation.

Metrics scrape is working (optional)

If Prometheus is installed in the cluster (most commonly via kube-prometheus-stack), the chart’s PodMonitor (cartesia-monitor) is auto-discovered through the release: prometheus label. Port-forward the Prometheus UI and confirm each component is being scraped:

kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090

Then visit http://localhost:9090 and query up{namespace="cartesia"} — every component pod should return 1. Once verified, see Smoke Tests and Benchmarking for functional smoke tests and performance benchmarks.

Ingress and TLS

The chart exposes the Cartesia API externally via a Kubernetes Ingress resource. The chart configures and annotates that Ingress for AWS EKS or GCP GKE. For other Kubernetes flavors (AKS, OpenShift, Rancher, kubeadm), disable the chart’s ingress and create your own. Select your platform below.

AWS EKS
GCP GKE
Self-managed

The chart configures the Ingress for the AWS Load Balancer Controller (ALB). Key behaviors:

TLS termination: TLS terminates at the ALB at minimum TLS 1.2 (ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01).
Backend leg: Traffic between the ALB and the API pod is plaintext HTTP (backend-protocol: HTTP). For end-to-end TLS, contact support@cartesia.ai.
Certificate: Pass an explicit ACM ARN via Terraform (certificate_arn) or Helm (ingress.certificateArn). If unset, the chart’s certificate-manager: 'true' annotation tells the AWS Load Balancer Controller to look up a matching ACM cert by hostname.
HTTP redirect: HTTP traffic on port 80 is redirected to HTTPS on port 443.

Terraform
Helm

enable_ingress  = true
ingress_route   = "api.cartesia.yourdomain.com"
certificate_arn = "arn:aws:acm:us-west-2:123456789:certificate/abc123"

ingress:
  enabled: true
  routes:
    - api.cartesia.yourdomain.com
  certificateArn: "arn:aws:acm:us-west-2:123456789:certificate/abc123"

See cartesia/templates/resources/ingress.yaml for the full annotation set the chart applies to the Ingress.

The chart configures the Ingress for the GKE built-in ingress controller. Key behaviors:

Certificate: The chart creates a ManagedCertificate resource ({release}-ssl-cert) covering every hostname in ingress.routes. GCP provisions the cert after validating domain ownership via DNS, so point your DNS A record at the ingress IP before deploying. Provisioning time depends on DNS propagation.
HTTP: HTTP is allowed alongside HTTPS (allow-http: true). The chart does not configure an HTTP-to-HTTPS redirect — add one separately if required.
Static IP (optional): Reserve a global IP via ingress_static_ip_name. To use an existing IP managed outside this stack, also set ingress_use_existing_static_ip = true. See infra/gcp/cartesia-gke/variables.tf for full descriptions.

Terraform
Helm

enable_ingress = true
ingress_route  = "api.cartesia.yourdomain.com"

ingress:
  enabled: true
  routes:
    - api.cartesia.yourdomain.com

The chart’s Ingress resource only emits valid annotations for aws or gcp providers. On AKS, OpenShift, Rancher, kubeadm, or any other Kubernetes flavor, disable the chart’s ingress and create your own:

ingress:
  enabled: false

Then create an Ingress resource using your cluster’s ingress controller (NGINX, Traefik, AKS Application Gateway, HAProxy, etc.) that routes to the {release}-api-server Service on port 5000. Configure TLS using a standard kubernetes.io/tls Secret referenced from the Ingress spec.tls, or use cert-manager to automate certificate issuance.Example for illustration (adapt to your ingress controller’s conventions):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: cartesia-api-ingress
  namespace: cartesia
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  tls:
    - hosts:
        - api.cartesia.yourdomain.com
      secretName: cartesia-tls
  rules:
    - host: api.cartesia.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: cartesia-api-server
                port:
                  number: 5000

Replace cartesia-api-server with your release name followed by -api-server.

Autoscaling

Cartesia supports two levels of autoscaling for Kubernetes deployments.

Cluster Autoscaler

Scales nodes based on pending pods. Enable in your tfvars:

enable_cluster_autoscaler = true

Node groups/pools will scale within their configured min_size/max_size bounds when pods can’t be scheduled due to insufficient resources.

Pod Autoscaler (KEDA)

Scales worker pods based on load metrics. Enable in your tfvars:

enable_pod_autoscaler = true
enable_metrics = true  # Required for KEDA

KEDA uses two scaling triggers:

Queue depth - Scales when unserviceable requests accumulate
Worker load - Scales when GPU utilization exceeds threshold

Per-Worker Scaling

Each worker can have its own scaling configuration:

workers = [
  {
    name = "tts-worker"
    workerArgs = { ... }
    autoscaling = {
      enabled = true
      threshold = 0.6      # Scale up when load > 60%
      minReplicas = 1
      maxReplicas = 10
    }
  }
]

Or in Helm values.yaml:

workers:
  - name: tts-worker
    workerArgs: { ... }
    autoscaling:
      enabled: true
      threshold: "0.6"
      minReplicas: 1
      maxReplicas: 10

Scaling Behavior

Scale up: 30 second stabilization window
Scale down: 900 second (15 min) stabilization window to avoid flapping
Workers scale independently based on their individual load

Go-live Checklist

Final review before opening to production traffic:

All pods Running and Ready, all images on the target release tag — see Verify.
Ingress reachable on its FQDN over HTTPS — see Verify.
TLS certificate active (ACM cert attached on EKS, ManagedCertificate Active on GKE, or BYO cert mounted for self-managed) — see Ingress and TLS.
Smoke tests pass — see Smoke Tests and Benchmarking.
Benchmark results within the expected range for the deployed GPU — see Performance per GPU.
Metrics scrape working (if Prometheus is installed) — see Verify.
Firewall and network policies match the deployed posture — see Outbound egress (Connected mode).
(Air-gapped only) License loaded; offline operation confirmed — see Air-Gapped Deployments.
On-call runbook documents the rollback procedure in Upgrades and Rollback.

Overview

Deployments

Guides

Performance

Terraform Deployment

Configuration

Worker Configuration

Helm-Only Deployment

1. Install Prerequisites

2. Create Secrets

3. Configure values.yaml

4. Deploy

Verify

Watch rollout

Pods are Ready

Ingress address is assigned

Metrics scrape is working (optional)

Ingress and TLS

Autoscaling

Cluster Autoscaler

Pod Autoscaler (KEDA)

Per-Worker Scaling

Scaling Behavior

Go-live Checklist

Overview

Deployments

Guides

Performance

Documentation Index

​Terraform Deployment

​Configuration

​Worker Configuration

​Helm-Only Deployment

​1. Install Prerequisites

​2. Create Secrets

​3. Configure values.yaml

​4. Deploy

​Verify

​Watch rollout

​Pods are Ready

​Ingress address is assigned

​Metrics scrape is working (optional)

​Ingress and TLS

​Autoscaling

​Cluster Autoscaler

​Pod Autoscaler (KEDA)

​Per-Worker Scaling

​Scaling Behavior

​Go-live Checklist

Terraform Deployment

Configuration

Worker Configuration

Helm-Only Deployment

1. Install Prerequisites

2. Create Secrets

3. Configure values.yaml

4. Deploy

Verify

Watch rollout

Pods are Ready

Ingress address is assigned

Metrics scrape is working (optional)

Ingress and TLS

Autoscaling

Cluster Autoscaler

Pod Autoscaler (KEDA)

Per-Worker Scaling

Scaling Behavior

Go-live Checklist