Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt

Use this file to discover all available pages before exploring further.

Cartesia provides Terraform configurations that deploy both infrastructure and the application, or you can deploy the Helm chart directly to an existing cluster.
Complete configurations are provided at deployment time by your Cartesia representative.

Terraform Deployment

Terraform creates the cluster, networking, GPU drivers, and deploys Cartesia via Helm. This is the fastest way for you to get started with self-hosting Cartesia.
Download cartesia-kube from the GCS bucket as described in Downloading cartesia-kube.
# Download and extract cartesia-kube from GCS (see Downloading cartesia-kube guide)
cd cartesia-kube

# Copy example config for your platform
cp aws-terraform.tfvars.example aws-terraform.tfvars  # or gcp-terraform.tfvars.example

# Deploy from the platform directory
cd infra/aws/cartesia-eks  # or infra/gcp/cartesia-gke
terraform init
terraform apply -var-file="../../../aws-terraform.tfvars" \
                -var "cartesia_api_key=$CARTESIA_API_KEY" \
                -var "service_account_json=$(cat /path/to/service-account.json)"

Configuration

region = "us-west-2"
name = "cartesia-production"

eks_admin_users = ["arn:aws:iam::123456789:user/admin"]

node_groups = {
  default = {
    ami_type = "AL2023_x86_64_STANDARD"
    instance_types = ["m7a.4xlarge"]
    min_size = 1
    max_size = 3
    desired_size = 1
  }
  gpu = {
    ami_type = "AL2023_x86_64_NVIDIA"
    instance_types = ["g5.2xlarge", "g5.4xlarge"]
    min_size = 1
    max_size = 5
    desired_size = 2
    disk_size = 100
    labels = { "nvidia.com/gpu.present" = "true" }
  }
}

# Ingress (optional)
enable_ingress = true
ingress_route = "api.cartesia.yourdomain.com"
certificate_arn = "arn:aws:acm:us-west-2:123456789:certificate/abc123"

# Hot reload (enabled by default)
enable_hot_reload = true
See Managing Artifacts for details on hot reload and adding voices and pronunciation dictionaries to your deployment.

Worker Configuration

Workers are defined in your tfvars file:
workers = [
  {
    name = "tts-worker"
    workerArgs = {
      model = "<model-name>"
      image = "cartesia-sonic-<model-name>"
      gpuType = "nvidia.com/gpu"
      capacity = 4
      operation = "TTS"
      useCB = true
      useLora = false
    }
    autoscaling = {
      enabled = true
      threshold = 0.6
      minReplicas = 1
      maxReplicas = 10
    }
  }
]
All the model workers have the images with prefix cartesia-sonic- followed by the specific model name. For instance, sonic-3 would use cartesia-sonic-rosy-dragon.

Helm-Only Deployment

For existing Kubernetes clusters, deploy the Helm chart directly.

1. Install Prerequisites

If you want autoscaling and metrics, install KEDA and Prometheus first:
# Prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

# KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace

2. Create Secrets

kubectl create namespace cartesia

kubectl create secret docker-registry gar-pull-secret \
  --namespace cartesia \
  --docker-server=us-docker.pkg.dev \
  --docker-username=_json_key \
  --docker-password="$(cat /path/to/service-account.json)"

3. Configure values.yaml

clusterName: cartesia-production

infra:
  provider: gcp  # or aws
  authenticate: true
  imageRegistry: us-docker.pkg.dev/cartesia-external/self-serve
  imagePullSecret: gar-pull-secret
  gcsSecretName: gar-pull-secret
  serviceAccount: cartesia-image-sa

release:
  version: "1.0.0"
  releaseTag: "sonic-20251118"

filesystem:
  storageClass:
    name: standard-rwo

ingress:
  enabled: true
  routes:
    - api.cartesia.yourdomain.com
  globalStaticIpName: cartesia-ingress-ip  # GKE only

metrics:
  enabled: true

legacyComponents:
  enabled: false

workers:
  - name: tts-worker
    workerArgs:
      model: <model-name>
      image: cartesia-sonic-<model-name>
      gpuType: nvidia.com/gpu
      capacity: 4
      operation: TTS
      useCB: true
      useLora: false
    autoscaling:
      enabled: true
      threshold: "0.6"
      minReplicas: 1
      maxReplicas: 10

4. Deploy

cd cartesia-kube/cartesia
helm upgrade --install cartesia . \
  --values values.yaml \
  --namespace cartesia

Verify

kubectl get pods -n cartesia
kubectl get ingress -n cartesia

Autoscaling

Cartesia supports two levels of autoscaling for Kubernetes deployments.

Cluster Autoscaler

Scales nodes based on pending pods. Enable in your tfvars:
enable_cluster_autoscaler = true
Node groups/pools will scale within their configured min_size/max_size bounds when pods can’t be scheduled due to insufficient resources.

Pod Autoscaler (KEDA)

Scales worker pods based on load metrics. Enable in your tfvars:
enable_pod_autoscaler = true
enable_metrics = true  # Required for KEDA
KEDA uses two scaling triggers:
  • Queue depth - Scales when unserviceable requests accumulate
  • Worker load - Scales when GPU utilization exceeds threshold

Per-Worker Scaling

Each worker can have its own scaling configuration:
workers = [
  {
    name = "tts-worker"
    workerArgs = { ... }
    autoscaling = {
      enabled = true
      threshold = 0.6      # Scale up when load > 60%
      minReplicas = 1
      maxReplicas = 10
    }
  }
]
Or in Helm values.yaml:
workers:
  - name: tts-worker
    workerArgs: { ... }
    autoscaling:
      enabled: true
      threshold: "0.6"
      minReplicas: 1
      maxReplicas: 10

Scaling Behavior

  • Scale up: 30 second stabilization window
  • Scale down: 900 second (15 min) stabilization window to avoid flapping
  • Workers scale independently based on their individual load