Documentation Index
Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
Use this file to discover all available pages before exploring further.
Cartesia provides Terraform configurations that deploy both infrastructure and the application, or you can deploy the Helm chart directly to an existing cluster.
Complete configurations are provided at deployment time by your Cartesia representative.
Terraform creates the cluster, networking, GPU drivers, and deploys Cartesia via Helm.
This is the fastest way for you to get started with self-hosting Cartesia.
# Download and extract cartesia-kube from GCS (see Downloading cartesia-kube guide)
cd cartesia-kube
# Copy example config for your platform
cp aws-terraform.tfvars.example aws-terraform.tfvars # or gcp-terraform.tfvars.example
# Deploy from the platform directory
cd infra/aws/cartesia-eks # or infra/gcp/cartesia-gke
terraform init
terraform apply -var-file="../../../aws-terraform.tfvars" \
-var "cartesia_api_key=$CARTESIA_API_KEY" \
-var "service_account_json=$(cat /path/to/service-account.json)"
Configuration
region = "us-west-2"
name = "cartesia-production"
eks_admin_users = ["arn:aws:iam::123456789:user/admin"]
node_groups = {
default = {
ami_type = "AL2023_x86_64_STANDARD"
instance_types = ["m7a.4xlarge"]
min_size = 1
max_size = 3
desired_size = 1
}
gpu = {
ami_type = "AL2023_x86_64_NVIDIA"
instance_types = ["g5.2xlarge", "g5.4xlarge"]
min_size = 1
max_size = 5
desired_size = 2
disk_size = 100
labels = { "nvidia.com/gpu.present" = "true" }
}
}
# Ingress (optional)
enable_ingress = true
ingress_route = "api.cartesia.yourdomain.com"
certificate_arn = "arn:aws:acm:us-west-2:123456789:certificate/abc123"
# Hot reload (enabled by default)
enable_hot_reload = true
project_id = "your-gcp-project"
region = "us-central1"
zone = "us-central1-a"
name = "cartesia-production"
gke_admin_users = ["user@yourdomain.com"]
node_pools = {
default = {
machine_type = "e2-standard-8"
min_count = 1
max_count = 3
initial_node_count = 1
}
gpu = {
machine_type = "g2-standard-8"
accelerator_type = "nvidia-l4"
accelerator_count = 1
min_count = 1
max_count = 5
initial_node_count = 2
disk_size_gb = 100
}
}
# Ingress (optional)
enable_ingress = true
ingress_route = "api.cartesia.yourdomain.com"
# Hot reload (enabled by default)
enable_hot_reload = true
See Managing Artifacts for details on hot reload and adding voices and pronunciation dictionaries to your deployment.
Worker Configuration
Workers are defined in your tfvars file:
workers = [
{
name = "tts-worker"
workerArgs = {
model = "<model-name>"
image = "cartesia-sonic-<model-name>"
gpuType = "nvidia.com/gpu"
capacity = 4
operation = "TTS"
useCB = true
useLora = false
}
autoscaling = {
enabled = true
threshold = 0.6
minReplicas = 1
maxReplicas = 10
}
}
]
All the model workers have the images with prefix cartesia-sonic- followed by the specific model name. For instance, sonic-3 would use cartesia-sonic-rosy-dragon.
Helm-Only Deployment
For existing Kubernetes clusters, deploy the Helm chart directly.
1. Install Prerequisites
If you want autoscaling and metrics, install KEDA and Prometheus first:
# Prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
# KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda \
--namespace keda \
--create-namespace
2. Create Secrets
kubectl create namespace cartesia
kubectl create secret docker-registry gar-pull-secret \
--namespace cartesia \
--docker-server=us-docker.pkg.dev \
--docker-username=_json_key \
--docker-password="$(cat /path/to/service-account.json)"
clusterName: cartesia-production
infra:
provider: gcp # or aws
authenticate: true
imageRegistry: us-docker.pkg.dev/cartesia-external/self-serve
imagePullSecret: gar-pull-secret
gcsSecretName: gar-pull-secret
serviceAccount: cartesia-image-sa
release:
version: "1.0.0"
releaseTag: "sonic-20251118"
filesystem:
storageClass:
name: standard-rwo
ingress:
enabled: true
routes:
- api.cartesia.yourdomain.com
globalStaticIpName: cartesia-ingress-ip # GKE only
metrics:
enabled: true
legacyComponents:
enabled: false
workers:
- name: tts-worker
workerArgs:
model: <model-name>
image: cartesia-sonic-<model-name>
gpuType: nvidia.com/gpu
capacity: 4
operation: TTS
useCB: true
useLora: false
autoscaling:
enabled: true
threshold: "0.6"
minReplicas: 1
maxReplicas: 10
4. Deploy
cd cartesia-kube/cartesia
helm upgrade --install cartesia . \
--values values.yaml \
--namespace cartesia
Verify
kubectl get pods -n cartesia
kubectl get ingress -n cartesia
Autoscaling
Cartesia supports two levels of autoscaling for Kubernetes deployments.
Cluster Autoscaler
Scales nodes based on pending pods. Enable in your tfvars:
enable_cluster_autoscaler = true
Node groups/pools will scale within their configured min_size/max_size bounds when pods can’t be scheduled due to insufficient resources.
Pod Autoscaler (KEDA)
Scales worker pods based on load metrics. Enable in your tfvars:
enable_pod_autoscaler = true
enable_metrics = true # Required for KEDA
KEDA uses two scaling triggers:
- Queue depth - Scales when unserviceable requests accumulate
- Worker load - Scales when GPU utilization exceeds threshold
Per-Worker Scaling
Each worker can have its own scaling configuration:
workers = [
{
name = "tts-worker"
workerArgs = { ... }
autoscaling = {
enabled = true
threshold = 0.6 # Scale up when load > 60%
minReplicas = 1
maxReplicas = 10
}
}
]
Or in Helm values.yaml:
workers:
- name: tts-worker
workerArgs: { ... }
autoscaling:
enabled: true
threshold: "0.6"
minReplicas: 1
maxReplicas: 10
Scaling Behavior
- Scale up: 30 second stabilization window
- Scale down: 900 second (15 min) stabilization window to avoid flapping
- Workers scale independently based on their individual load