Documentation Index
Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
Use this file to discover all available pages before exploring further.
Docker Compose and Docker Swarm deployment are currently in beta. Connect with the Cartesia team for support.
Deploy Cartesia TTS on a single machine with Docker Compose, or across a multi-node cluster with Docker Swarm.
| Docker Compose | Docker Swarm |
|---|
| Nodes | Single host | Multiple hosts (managers + workers) |
| GPU scaling | Multiple workers via WORKER_REPLICAS (one per GPU) | Workers scheduled on labeled GPU nodes |
| MIG support | Auto-detected via --mig flag | Per-node via node labels and --mig flag |
| Networking | Bridge (default) | Overlay (Swarm-managed) |
Prerequisites
- One or more machines with Docker installed (your user must be in the
docker group)
- Compose only: Docker Compose V2 (
docker compose)
- Swarm only: nodes meet Docker’s Swarm networking requirements
- At least one NVIDIA GPU with drivers installed. MIG (Multi-Instance GPU) partitioning is supported on compatible NVIDIA GPUs
- GPU nodes have the nvidia Docker runtime set as default (see below)
- The
cartesia-kube repo downloaded as described in Downloading cartesia-kube
- A Cartesia API key file (
container_key) and a GCS service account JSON file, provided during onboarding
GPU runtime check
On each GPU node, verify the NVIDIA runtime:
nvidia-smi
docker info | grep "Default Runtime"
# Expected: Default Runtime: nvidia
docker run --rm nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi
If nvidia is not the default runtime, install the NVIDIA Container Toolkit and run:
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
sudo systemctl restart docker
If using MIG: After enabling MIG and creating instances on the host, verify they are visible:
nvidia-smi -L
# Each MIG instance appears as a MIG-... UUID line beneath its parent GPU.
# The deploy script reads these UUIDs automatically — no manual configuration required.
MIG must be enabled and instances created on the host before deploying. Recreating MIG instances generates new UUIDs; redeploy the stack if this happens.
Step 1 — Prepare secrets
Place these files on the host (Compose) or manager node (Swarm):
container_key — file containing your Cartesia API key
service-account.json — GCS service account JSON with roles/artifactregistry.reader (image pull) and roles/storage.objectViewer (GCS sync)
Make the deploy script executable:
chmod +x local/scripts/deploy-compose.sh
chmod +x local/scripts/deploy-swarm.sh
Step 2 — Initialize the cluster (Swarm only)
Skip this step if you are using Docker Compose.
On the manager node:
docker swarm init --advertise-addr <MANAGER_IP>
Copy the docker swarm join command from the output. On each additional node, run:
docker swarm join --token <TOKEN> <MANAGER_IP>:2377
Label each node from the manager. Use docker node ls to list node IDs:
docker node update --label-add cpu=true <node-id> # CPU services (API, NATS, etc.)
docker node update --label-add gpu=true <node-id> # Standard GPU workers
If using MIG: Label MIG-enabled nodes with mig=true and a comma-separated list of their MIG instance UUIDs (obtained from nvidia-smi -L on that node). Do not apply gpu=true to MIG nodes.
docker node update --label-add mig=true <node-id>
docker node update --label-add 'mig.uuids=MIG-<uuid1>,MIG-<uuid2>' <node-id>
Mixed clusters with both standard GPU nodes and MIG nodes are supported — the deploy script handles scheduling for both automatically.
Set environment variables before deploying. Use a .env file in local/ (see local/.env.example) or export them in your shell.
export IMAGE_REGISTRY="YOUR_IMAGE_REGISTRY"
export RELEASE_TAG="YOUR_RELEASE_TAG"
export MODEL_NAME="YOUR_MODEL_NAME"
export CONTAINER_KEY_FILE=/path/to/cartesia-api-key
export GCS_SA_FILE=/path/to/service-account.json
# Optional
export WORKER_REPLICAS=1
export WORKER_CAPACITY=4
export BUCKET_NAME=""
export CLUSTER_NAME="cartesia-compose" # or "cartesia-swarm"
export USE_MIG=0 # set to 1 to enable MIG mode (or pass --mig to the deploy script)
See Configuration for full details on each variable.
Step 4 — Deploy
From the repo root:# Standard deployment
./local/scripts/deploy-compose.sh
# With MIG support (auto-detects MIG instances via nvidia-smi)
./local/scripts/deploy-compose.sh --mig
When --mig is used, the script auto-detects MIG instance UUIDs from nvidia-smi, generates a per-slice worker configuration, and scales the standard worker to zero. On the manager node:# Standard deployment
./local/scripts/deploy-swarm.sh
# With MIG support (reads UUIDs from node labels)
./local/scripts/deploy-swarm.sh --mig
This will:
- Verify that nodes are labeled (fails with instructions if not).
- Create encrypted Swarm secrets from your key and service account files.
- Deploy all services. With
--mig, one dedicated worker service is created per MIG instance, each pinned to its node.
TTS workers take a few minutes to load the model into GPU memory. During this time, TTS requests will return errors even though containers appear healthy. Wait for the ready signal:cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs -f tts-worker 2>&1 | grep -i "ready"
docker service logs cartesia_tts-worker -f 2>&1 | grep -i "ready"
Step 5 — Verify
Check that services are running:
cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml ps
If deployed with MIG, verify each worker sees exactly one MIG device:# List all running services (MIG workers appear as tts-worker-mig-0, tts-worker-mig-1, etc.)
cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml -f docker-compose.mig.generated.yaml ps
docker stack services cartesia
If deployed with MIG, verify MIG worker services are scheduled and running:docker stack ps cartesia --filter 'name=cartesia_tts-worker-mig'
Test the API:
curl http://localhost:5000/status
Test TTS:
curl -s -X POST "http://localhost:5000/tts/bytes" \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-H "Cartesia-Version: 2024-06-10" \
-d '{
"model_id": "sonic-3",
"transcript": "Hello from Cartesia.",
"voice": {"mode": "id", "id": "00510a15-4216-4fdc-a0ab-05d74cd9f795"},
"language": "en",
"output_format": {"container": "mp3", "sample_rate": 44100, "bit_rate": 128000}
}' --output test.mp3
Troubleshooting
cd local
docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs api
docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs tts-worker
# Restart everything
docker compose -f docker-compose.base.yaml -f docker-compose.yaml down
docker compose -f docker-compose.base.yaml -f docker-compose.yaml up -d
If the API exits with no servers available for connection (NATS not ready), restart the API after the stack is up:cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml up -d && docker compose -f docker-compose.base.yaml -f docker-compose.yaml restart api
docker stack ps cartesia --no-trunc
docker service logs cartesia_api
docker service logs cartesia_tts-worker
# Restart the stack
docker stack rm cartesia
sleep 10
cd local && docker stack deploy --with-registry-auth -c docker-compose.base.yaml -c docker-compose.swarm.yaml cartesia
Configuration
Set these environment variables before running the deploy script. You receive IMAGE_REGISTRY, RELEASE_TAG, and MODEL_NAME from Cartesia during onboarding. If you mirror images into your own registry, use your mirror URL for IMAGE_REGISTRY.
Required
| Variable | Description |
|---|
IMAGE_REGISTRY | Container image registry URL (Cartesia registry or your mirror). |
RELEASE_TAG | Image tag for the release you are deploying (updates per release). |
MODEL_NAME | TTS model identifier for the worker image. |
CONTAINER_KEY_FILE | Path to file containing your Cartesia API key. |
GCS_SA_FILE | Path to GCS service account JSON file. |
Optional
| Variable | Default | Description |
|---|
WORKER_REPLICAS | 1 | Number of TTS worker containers. For Compose, set to your GPU count on the host. For Swarm, scale to match your GPU node count. |
WORKER_CAPACITY | 4 | Max concurrent TTS requests per worker. Lower if you run out of GPU memory. |
BUCKET_NAME | (empty) | GCS bucket for migrations/LoRAs. Leave empty to disable sync. |
CLUSTER_NAME | cartesia-compose / cartesia-swarm | Identifier for logs and metrics. |
GCS_SYNC_INTERVAL | 300 | GCS sync interval in seconds. |
USE_MIG | 0 | Set to 1 to enable MIG mode. |