> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Managed Kubernetes

> Deploy Cartesia on AWS EKS and GCP GKE

Cartesia provides Terraform configurations that deploy both infrastructure and the application, or you can deploy the Helm chart directly to an existing cluster.

<Note>Complete configurations are provided at deployment time by your Cartesia representative.</Note>

<Note>For Kubernetes deployments, the identity that runs Terraform or Helm must be able to create and update the license proxy's `ClusterRole` and `ClusterRoleBinding`. These resources grant the dedicated license-proxy service account read access to the `kube-system` namespace and the Cartesia release namespace that is used to identify the deployment.</Note>

## Terraform Deployment

Terraform creates the cluster, networking, GPU drivers, and deploys Cartesia via Helm.
This is the fastest way for you to get started with self-hosting Cartesia.

<Note>Download cartesia-kube from the GCS bucket as described in [Downloading cartesia-kube](/self-hosted/provisioned-resources#deployment-configurations).</Note>

```bash theme={null}
# Download and extract cartesia-kube from GCS (see Downloading cartesia-kube guide)
cd cartesia-kube

# Copy example config for your platform
cp aws-terraform.tfvars.example aws-terraform.tfvars  # or gcp-terraform.tfvars.example

# Deploy from the platform directory
cd infra/aws/cartesia-eks  # or infra/gcp/cartesia-gke
terraform init
terraform apply -var-file="../../../aws-terraform.tfvars" \
                -var "cartesia_api_key=$CARTESIA_API_KEY" \
                -var "service_account_json=$(cat /path/to/service-account.json)"
```

### Configuration

<Tabs>
  <Tab title="AWS EKS">
    ```hcl theme={null}
    region = "us-west-2"
    name = "cartesia-production"

    eks_admin_users = ["arn:aws:iam::123456789:user/admin"]

    node_groups = {
      default = {
        ami_type = "AL2023_x86_64_STANDARD"
        instance_types = ["m7a.4xlarge"]
        min_size = 1
        max_size = 3
        desired_size = 1
      }
      gpu = {
        ami_type = "AL2023_x86_64_NVIDIA"
        instance_types = ["g5.2xlarge", "g5.4xlarge"]
        min_size = 1
        max_size = 5
        desired_size = 2
        disk_size = 100
        labels = { "nvidia.com/gpu.present" = "true" }
      }
    }

    # Ingress (optional)
    enable_ingress = true
    ingress_route = "api.cartesia.yourdomain.com"
    certificate_arn = "arn:aws:acm:us-west-2:123456789:certificate/abc123"

    # Hot reload (enabled by default)
    enable_hot_reload = true
    ```
  </Tab>

  <Tab title="GCP GKE">
    ```hcl theme={null}
    project_id = "your-gcp-project"
    region = "us-central1"
    zone = "us-central1-a"
    name = "cartesia-production"

    gke_admin_users = ["user@yourdomain.com"]

    node_pools = {
      default = {
        machine_type = "e2-standard-8"
        min_count = 1
        max_count = 3
        initial_node_count = 1
      }
      gpu = {
        machine_type = "g2-standard-8"
        accelerator_type = "nvidia-l4"
        accelerator_count = 1
        min_count = 1
        max_count = 5
        initial_node_count = 2
        disk_size_gb = 100
      }
    }

    # Ingress (optional)
    enable_ingress = true
    ingress_route = "api.cartesia.yourdomain.com"

    # Hot reload (enabled by default)
    enable_hot_reload = true
    ```
  </Tab>
</Tabs>

See [Managing Artifacts](/self-hosted/managing-artifacts) for details on hot reload and adding voices and pronunciation dictionaries to your deployment.

### Worker Configuration

Workers are defined in your tfvars file:

```hcl theme={null}
workers = [
  {
    name = "tts-worker"
    workerArgs = {
      model = "<model-name>"
      image = "cartesia-sonic-<model-name>"
      gpuType = "nvidia.com/gpu"
      capacity = 4
      operation = "TTS"
      useCB = true
      useLora = false
    }
    autoscaling = {
      enabled = true
      threshold = 0.6
      minReplicas = 1
      maxReplicas = 10
    }
  }
]
```

All the model workers have the images with prefix `cartesia-sonic-` followed by the specific model name. For instance, sonic-3 would use `cartesia-sonic-rosy-dragon`.

## Helm-Only Deployment

For existing Kubernetes clusters, deploy the Helm chart directly.

### 1. Install Prerequisites

If you want autoscaling and metrics, install KEDA and Prometheus first:

```bash theme={null}
# Prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

# KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace
```

### 2. Create Secrets

```bash theme={null}
kubectl create namespace cartesia

kubectl create secret docker-registry gar-pull-secret \
  --namespace cartesia \
  --docker-server=us-docker.pkg.dev \
  --docker-username=_json_key \
  --docker-password="$(cat /path/to/service-account.json)"

kubectl create secret generic api-key-secret \
  --namespace cartesia \
  --from-literal=CONTAINER_KEY="$CARTESIA_API_KEY" \
  --from-literal=CARTESIA_API_KEY="$CARTESIA_API_KEY" \
  --from-literal=OPENAI_API_KEY="$OPENAI_API_KEY" \
  --from-literal=ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY"

kubectl create secret generic gcs-access-secret \
  --namespace cartesia \
  --from-file=service-account.json=/path/to/service-account.json
```

### 3. Configure values.yaml

```yaml theme={null}
clusterName: cartesia-production

infra:
  provider: gcp  # or aws
  authenticate: true
  imageRegistry: us-docker.pkg.dev/cartesia-external/self-serve
  imagePullSecret: gar-pull-secret
  gcsSecretName: gar-pull-secret
  serviceAccount: cartesia-image-sa

release:
  version: "1.0.0"
  releaseTag: "sonic-20251118"

filesystem:
  storageClass:
    name: standard-rwo

ingress:
  enabled: true
  routes:
    - api.cartesia.yourdomain.com
  globalStaticIpName: cartesia-ingress-ip  # GKE only

metrics:
  enabled: true

legacyComponents:
  enabled: false

workers:
  - name: tts-worker
    workerArgs:
      model: <model-name>
      image: cartesia-sonic-<model-name>
      gpuType: nvidia.com/gpu
      capacity: 4
      operation: TTS
      useCB: true
      useLora: false
    autoscaling:
      enabled: true
      threshold: "0.6"
      minReplicas: 1
      maxReplicas: 10
```

### 4. Deploy

```bash theme={null}
cd cartesia-kube/cartesia
helm upgrade --install cartesia . \
  --values values.yaml \
  --namespace cartesia
```

### Verify

Confirm the deployment is healthy before sending traffic. The commands below assume the default `cartesia` namespace and release name used in the examples above — substitute if you customized either.

#### Watch rollout

```bash theme={null}
kubectl rollout status deployment/cartesia-api -n cartesia
kubectl rollout status deployment/<worker-name> -n cartesia
```

Worker pods take longer than the API because the model must load into GPU memory. A worker stays `Running` but not `Ready` until `inferno_worker_capacity > 0`.

#### Pods are Ready

```bash theme={null}
kubectl get pods -n cartesia
```

All pods should be `Running` with every container `Ready`. Probe behaviors:

* **API pod becomes Ready** when `GET /status` returns 200 on port 5000.
* **Worker pods become Ready** when the startup probe is satisfied — it polls `/metrics` until `inferno_worker_capacity` reports a value greater than 0. While the model is still loading, the worker shows `Running` but not `Ready`.
* **License-proxy and NATS** have no chart-defined health probes; they are Ready as soon as the container starts.

#### Ingress address is assigned

```bash theme={null}
kubectl get ingress -n cartesia
```

The `ADDRESS` column should be populated with the load balancer's hostname or IP.

On GKE, also check the `ManagedCertificate` status — the chart creates the resource, and GCP provisions the certificate asynchronously after DNS validation:

```bash theme={null}
kubectl describe managedcertificate cartesia-ssl-cert -n cartesia
```

Look for `Status: Active`. While GCP is still provisioning, HTTPS calls to the ingress will fail certificate validation.

#### Metrics scrape is working (optional)

If Prometheus is installed in the cluster (most commonly via `kube-prometheus-stack`), the chart's `PodMonitor` (`cartesia-monitor`) is auto-discovered through the `release: prometheus` label. Port-forward the Prometheus UI and confirm each component is being scraped:

```bash theme={null}
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090
```

Then visit `http://localhost:9090` and query `up{namespace="cartesia"}` — every component pod should return `1`.

Once verified, see [Smoke Tests and Benchmarking](/self-hosted/smoke-tests-and-benchmarking) for functional smoke tests and performance benchmarks.

## Ingress and TLS

The chart exposes the Cartesia API externally via a Kubernetes `Ingress` resource. The chart configures and annotates that Ingress for AWS EKS or GCP GKE. For other Kubernetes flavors (AKS, OpenShift, Rancher, kubeadm), disable the chart's ingress and create your own. Select your platform below.

<Tabs>
  <Tab title="AWS EKS">
    The chart configures the Ingress for the AWS Load Balancer Controller (ALB). Key behaviors:

    * **TLS termination:** TLS terminates at the ALB at minimum TLS 1.2 (`ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01`).
    * **Backend leg:** Traffic between the ALB and the API pod is plaintext HTTP (`backend-protocol: HTTP`). For end-to-end TLS, contact [support@cartesia.ai](mailto:support@cartesia.ai).
    * **Certificate:** Pass an explicit ACM ARN via Terraform (`certificate_arn`) or Helm (`ingress.certificateArn`). If unset, the chart's `certificate-manager: 'true'` annotation tells the AWS Load Balancer Controller to look up a matching ACM cert by hostname.
    * **HTTP redirect:** HTTP traffic on port 80 is redirected to HTTPS on port 443.

    <Tabs>
      <Tab title="Terraform">
        ```hcl theme={null}
        enable_ingress  = true
        ingress_route   = "api.cartesia.yourdomain.com"
        certificate_arn = "arn:aws:acm:us-west-2:123456789:certificate/abc123"
        ```
      </Tab>

      <Tab title="Helm">
        ```yaml theme={null}
        ingress:
          enabled: true
          routes:
            - api.cartesia.yourdomain.com
          certificateArn: "arn:aws:acm:us-west-2:123456789:certificate/abc123"
        ```
      </Tab>
    </Tabs>

    See `cartesia/templates/resources/ingress.yaml` for the full annotation set the chart applies to the Ingress.
  </Tab>

  <Tab title="GCP GKE">
    The chart configures the Ingress for the GKE built-in ingress controller. Key behaviors:

    * **Certificate:** The chart creates a `ManagedCertificate` resource (`{release}-ssl-cert`) covering every hostname in `ingress.routes`. GCP provisions the cert after validating domain ownership via DNS, so point your DNS A record at the ingress IP before deploying. Provisioning time depends on DNS propagation.
    * **HTTP:** HTTP is allowed alongside HTTPS (`allow-http: true`). The chart does not configure an HTTP-to-HTTPS redirect — add one separately if required.
    * **Static IP (optional):** Reserve a global IP via `ingress_static_ip_name`. To use an existing IP managed outside this stack, also set `ingress_use_existing_static_ip = true`. See `infra/gcp/cartesia-gke/variables.tf` for full descriptions.

    <Tabs>
      <Tab title="Terraform">
        ```hcl theme={null}
        enable_ingress = true
        ingress_route  = "api.cartesia.yourdomain.com"
        ```
      </Tab>

      <Tab title="Helm">
        ```yaml theme={null}
        ingress:
          enabled: true
          routes:
            - api.cartesia.yourdomain.com
        ```
      </Tab>
    </Tabs>
  </Tab>

  <Tab title="Self-managed">
    The chart's Ingress resource only emits valid annotations for `aws` or `gcp` providers. On AKS, OpenShift, Rancher, kubeadm, or any other Kubernetes flavor, disable the chart's ingress and create your own:

    ```yaml theme={null}
    ingress:
      enabled: false
    ```

    Then create an `Ingress` resource using your cluster's ingress controller (NGINX, Traefik, AKS Application Gateway, HAProxy, etc.) that routes to the `{release}-api-server` Service on port 5000. Configure TLS using a standard `kubernetes.io/tls` Secret referenced from the Ingress `spec.tls`, or use [cert-manager](https://cert-manager.io/) to automate certificate issuance.

    Example for illustration (adapt to your ingress controller's conventions):

    ```yaml theme={null}
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: cartesia-api-ingress
      namespace: cartesia
      annotations:
        kubernetes.io/ingress.class: nginx
    spec:
      tls:
        - hosts:
            - api.cartesia.yourdomain.com
          secretName: cartesia-tls
      rules:
        - host: api.cartesia.yourdomain.com
          http:
            paths:
              - path: /
                pathType: Prefix
                backend:
                  service:
                    name: cartesia-api-server
                    port:
                      number: 5000
    ```

    Replace `cartesia-api-server` with your release name followed by `-api-server`.
  </Tab>
</Tabs>

## Autoscaling

Cartesia supports two levels of autoscaling for Kubernetes deployments.

### Cluster Autoscaler

Scales nodes based on pending pods. Enable in your tfvars:

```hcl theme={null}
enable_cluster_autoscaler = true
```

Node groups/pools will scale within their configured `min_size`/`max_size` bounds when pods can't be scheduled due to insufficient resources.

### Pod Autoscaler (KEDA)

Scales worker pods based on load metrics. Enable in your tfvars:

```hcl theme={null}
enable_pod_autoscaler = true
enable_metrics = true  # Required for KEDA
```

KEDA uses two scaling triggers:

* **Queue depth** - Scales when unserviceable requests accumulate
* **Worker load** - Scales when GPU utilization exceeds threshold

### Per-Worker Scaling

Each worker can have its own scaling configuration:

```hcl theme={null}
workers = [
  {
    name = "tts-worker"
    workerArgs = { ... }
    autoscaling = {
      enabled = true
      threshold = 0.6      # Scale up when load > 60%
      minReplicas = 1
      maxReplicas = 10
    }
  }
]
```

Or in Helm values.yaml:

```yaml theme={null}
workers:
  - name: tts-worker
    workerArgs: { ... }
    autoscaling:
      enabled: true
      threshold: "0.6"
      minReplicas: 1
      maxReplicas: 10
```

### Scaling Behavior

* **Scale up**: 30 second stabilization window
* **Scale down**: 900 second (15 min) stabilization window to avoid flapping
* Workers scale independently based on their individual load

## Go-live Checklist

Final review before opening to production traffic:

* All pods `Running` and `Ready`, all images on the target release tag — see [Verify](#verify).
* Ingress reachable on its FQDN over HTTPS — see [Verify](#verify).
* TLS certificate active (ACM cert attached on EKS, ManagedCertificate `Active` on GKE, or BYO cert mounted for self-managed) — see [Ingress and TLS](#ingress-and-tls).
* Smoke tests pass — see [Smoke Tests and Benchmarking](/self-hosted/smoke-tests-and-benchmarking).
* Benchmark results within the expected range for the deployed GPU — see [Performance per GPU](/self-hosted/hardware-selection#performance-per-gpu).
* Metrics scrape working (if Prometheus is installed) — see [Verify](#verify).
* Firewall and network policies match the deployed posture — see [Outbound egress (Connected mode)](/self-hosted/architecture#outbound-egress-connected-mode).
* (Air-gapped only) License loaded; offline operation confirmed — see [Air-Gapped Deployments](/self-hosted/air-gapped).
* On-call runbook documents the rollback procedure in [Upgrades and Rollback](/self-hosted/upgrades-and-rollback).