Cartesia provides Terraform configurations that deploy both infrastructure and the application, or you can deploy the Helm chart directly to an existing cluster.Documentation Index
Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
Use this file to discover all available pages before exploring further.
Complete configurations are provided at deployment time by your Cartesia representative.
Terraform Deployment
Terraform creates the cluster, networking, GPU drivers, and deploys Cartesia via Helm. This is the fastest way for you to get started with self-hosting Cartesia.Download cartesia-kube from the GCS bucket as described in Downloading cartesia-kube.
Configuration
- AWS EKS
- GCP GKE
Worker Configuration
Workers are defined in your tfvars file:cartesia-sonic- followed by the specific model name. For instance, sonic-3 would use cartesia-sonic-rosy-dragon.
Helm-Only Deployment
For existing Kubernetes clusters, deploy the Helm chart directly.1. Install Prerequisites
If you want autoscaling and metrics, install KEDA and Prometheus first:2. Create Secrets
3. Configure values.yaml
4. Deploy
Verify
Confirm the deployment is healthy before sending traffic. The commands below assume the defaultcartesia namespace and release name used in the examples above — substitute if you customized either.
Watch rollout
Running but not Ready until inferno_worker_capacity > 0.
Pods are Ready
Running with every container Ready. Probe behaviors:
- API pod becomes Ready when
GET /statusreturns 200 on port 5000. - Worker pods become Ready when the startup probe is satisfied — it polls
/metricsuntilinferno_worker_capacityreports a value greater than 0. While the model is still loading, the worker showsRunningbut notReady. - License-proxy and NATS have no chart-defined health probes; they are Ready as soon as the container starts.
Ingress address is assigned
ADDRESS column should be populated with the load balancer’s hostname or IP.
On GKE, also check the ManagedCertificate status — the chart creates the resource, and GCP provisions the certificate asynchronously after DNS validation:
Status: Active. While GCP is still provisioning, HTTPS calls to the ingress will fail certificate validation.
Metrics scrape is working (optional)
If Prometheus is installed in the cluster (most commonly viakube-prometheus-stack), the chart’s PodMonitor (cartesia-monitor) is auto-discovered through the release: prometheus label. Port-forward the Prometheus UI and confirm each component is being scraped:
http://localhost:9090 and query up{namespace="cartesia"} — every component pod should return 1.
Once verified, see Smoke Tests and Benchmarking for functional smoke tests and performance benchmarks.
Ingress and TLS
The chart exposes the Cartesia API externally via a KubernetesIngress resource. The chart configures and annotates that Ingress for AWS EKS or GCP GKE. For other Kubernetes flavors (AKS, OpenShift, Rancher, kubeadm), disable the chart’s ingress and create your own. Select your platform below.
- AWS EKS
- GCP GKE
- Self-managed
The chart configures the Ingress for the AWS Load Balancer Controller (ALB). Key behaviors:See
- TLS termination: TLS terminates at the ALB at minimum TLS 1.2 (
ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01). - Backend leg: Traffic between the ALB and the API pod is plaintext HTTP (
backend-protocol: HTTP). For end-to-end TLS, contact support@cartesia.ai. - Certificate: Pass an explicit ACM ARN via Terraform (
certificate_arn) or Helm (ingress.certificateArn). If unset, the chart’scertificate-manager: 'true'annotation tells the AWS Load Balancer Controller to look up a matching ACM cert by hostname. - HTTP redirect: HTTP traffic on port 80 is redirected to HTTPS on port 443.
- Terraform
- Helm
cartesia/templates/resources/ingress.yaml for the full annotation set the chart applies to the Ingress.Autoscaling
Cartesia supports two levels of autoscaling for Kubernetes deployments.Cluster Autoscaler
Scales nodes based on pending pods. Enable in your tfvars:min_size/max_size bounds when pods can’t be scheduled due to insufficient resources.
Pod Autoscaler (KEDA)
Scales worker pods based on load metrics. Enable in your tfvars:- Queue depth - Scales when unserviceable requests accumulate
- Worker load - Scales when GPU utilization exceeds threshold
Per-Worker Scaling
Each worker can have its own scaling configuration:Scaling Behavior
- Scale up: 30 second stabilization window
- Scale down: 900 second (15 min) stabilization window to avoid flapping
- Workers scale independently based on their individual load
Go-live Checklist
Final review before opening to production traffic:- All pods
RunningandReady, all images on the target release tag — see Verify. - Ingress reachable on its FQDN over HTTPS — see Verify.
- TLS certificate active (ACM cert attached on EKS, ManagedCertificate
Activeon GKE, or BYO cert mounted for self-managed) — see Ingress and TLS. - Smoke tests pass — see Smoke Tests and Benchmarking.
- Benchmark results within the expected range for the deployed GPU — see Performance per GPU.
- Metrics scrape working (if Prometheus is installed) — see Verify.
- Firewall and network policies match the deployed posture — see Outbound egress (Connected mode).
- (Air-gapped only) License loaded; offline operation confirmed — see Air-Gapped Deployments.
- On-call runbook documents the rollback procedure in Upgrades and Rollback.