Docker Compose and Docker Swarm deployment are currently in beta. Connect with the Cartesia team for support.
Deploy Cartesia TTS on a single machine with Docker Compose, or across a multi-node cluster with Docker Swarm.
| Docker Compose | Docker Swarm |
|---|
| Nodes | Single host | Multiple hosts (managers + workers) |
| GPU scaling | Multiple workers via WORKER_REPLICAS (one per GPU) | Workers scheduled on labeled GPU nodes |
| MIG support | Auto-detected via --mig flag | Per-node via node labels and --mig flag |
| Networking | Bridge (default) | Overlay (Swarm-managed) |
Prerequisites
- One or more machines with Docker installed (your user must be in the
docker group)
- Compose only: Docker Compose V2 (
docker compose)
- Swarm only: nodes meet Docker’s Swarm networking requirements
- At least one NVIDIA GPU with drivers installed. MIG (Multi-Instance GPU) partitioning is supported on compatible NVIDIA GPUs
- GPU nodes have the nvidia Docker runtime set as default (see below)
- The
cartesia-kube repo downloaded as described in Downloading cartesia-kube
- A Cartesia API key file (
container_key) and a GCS service account JSON file, provided during onboarding
GPU runtime check
On each GPU node, verify the NVIDIA runtime:
nvidia-smi
docker info | grep "Default Runtime"
# Expected: Default Runtime: nvidia
docker run --rm nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi
If nvidia is not the default runtime, install the NVIDIA Container Toolkit and run:
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
sudo systemctl restart docker
If using MIG: After enabling MIG and creating instances on the host, verify they are visible:
nvidia-smi -L
# Each MIG instance appears as a MIG-... UUID line beneath its parent GPU.
# The deploy script reads these UUIDs automatically — no manual configuration required.
MIG must be enabled and instances created on the host before deploying. Recreating MIG instances generates new UUIDs; redeploy the stack if this happens.
Step 1 — Prepare secrets
Place these files on the host (Compose) or manager node (Swarm):
container_key — file containing your Cartesia API key
service-account.json — GCS service account JSON with roles/artifactregistry.reader (image pull) and roles/storage.objectViewer (GCS sync)
Make the deploy script executable:
chmod +x local/scripts/deploy-compose.sh
chmod +x local/scripts/deploy-swarm.sh
Step 2 — Initialize the cluster (Swarm only)
Skip this step if you are using Docker Compose.
On the manager node:
docker swarm init --advertise-addr <MANAGER_IP>
Copy the docker swarm join command from the output. On each additional node, run:
docker swarm join --token <TOKEN> <MANAGER_IP>:2377
Label each node from the manager. Use docker node ls to list node IDs:
docker node update --label-add cpu=true <node-id> # CPU services (API, NATS, etc.)
docker node update --label-add gpu=true <node-id> # Standard GPU workers
If using MIG: Label MIG-enabled nodes with mig=true and a comma-separated list of their MIG instance UUIDs (obtained from nvidia-smi -L on that node). Do not apply gpu=true to MIG nodes.
docker node update --label-add mig=true <node-id>
docker node update --label-add 'mig.uuids=MIG-<uuid1>,MIG-<uuid2>' <node-id>
Mixed clusters with both standard GPU nodes and MIG nodes are supported — the deploy script handles scheduling for both automatically.
Set environment variables before deploying. Use a .env file in local/ (see local/.env.example) or export them in your shell.
export IMAGE_REGISTRY="YOUR_IMAGE_REGISTRY"
export RELEASE_TAG="YOUR_RELEASE_TAG"
export MODEL_NAME="YOUR_MODEL_NAME"
export CONTAINER_KEY_FILE=/path/to/cartesia-api-key
export GCS_SA_FILE=/path/to/service-account.json
# Optional
export WORKER_REPLICAS=1
export WORKER_CAPACITY=4
export BUCKET_NAME=""
export CLUSTER_NAME="cartesia-compose" # or "cartesia-swarm"
export USE_MIG=0 # set to 1 to enable MIG mode (or pass --mig to the deploy script)
See Configuration for full details on each variable.
Step 4 — Deploy
From the repo root:# Standard deployment
./local/scripts/deploy-compose.sh
# With MIG support (auto-detects MIG instances via nvidia-smi)
./local/scripts/deploy-compose.sh --mig
When --mig is used, the script auto-detects MIG instance UUIDs from nvidia-smi, generates a per-slice worker configuration, and scales the standard worker to zero. On the manager node:# Standard deployment
./local/scripts/deploy-swarm.sh
# With MIG support (reads UUIDs from node labels)
./local/scripts/deploy-swarm.sh --mig
This will:
- Verify that nodes are labeled (fails with instructions if not).
- Create encrypted Swarm secrets from your key and service account files.
- Deploy all services. With
--mig, one dedicated worker service is created per MIG instance, each pinned to its node.
TTS workers take a few minutes to load the model into GPU memory. During this time, TTS requests will return errors even though containers appear healthy. Wait for the ready signal:cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs -f tts-worker 2>&1 | grep -i "ready"
docker service logs cartesia_tts-worker -f 2>&1 | grep -i "ready"
Step 5 — Verify
Check that services are running:
cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml ps
If deployed with MIG, verify each worker sees exactly one MIG device:# List all running services (MIG workers appear as tts-worker-mig-0, tts-worker-mig-1, etc.)
cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml -f docker-compose.mig.generated.yaml ps
docker stack services cartesia
If deployed with MIG, verify MIG worker services are scheduled and running:docker stack ps cartesia --filter 'name=cartesia_tts-worker-mig'
Test the API:
curl http://localhost:5000/status
Test TTS:
curl -s -X POST "http://localhost:5000/tts/bytes" \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-H "Cartesia-Version: 2024-06-10" \
-d '{
"model_id": "sonic-3.5",
"transcript": "Hello from Cartesia.",
"voice": {"mode": "id", "id": "00510a15-4216-4fdc-a0ab-05d74cd9f795"},
"language": "en",
"output_format": {"container": "mp3", "sample_rate": 44100, "bit_rate": 128000}
}' --output test.mp3
Troubleshooting
cd local
docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs api
docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs tts-worker
# Restart everything
docker compose -f docker-compose.base.yaml -f docker-compose.yaml down
docker compose -f docker-compose.base.yaml -f docker-compose.yaml up -d
If the API exits with no servers available for connection (NATS not ready), restart the API after the stack is up:cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml up -d && docker compose -f docker-compose.base.yaml -f docker-compose.yaml restart api
docker stack ps cartesia --no-trunc
docker service logs cartesia_api
docker service logs cartesia_tts-worker
# Restart the stack
docker stack rm cartesia
sleep 10
cd local && docker stack deploy --with-registry-auth -c docker-compose.base.yaml -c docker-compose.swarm.yaml cartesia
Configuration
Set these environment variables before running the deploy script. You receive IMAGE_REGISTRY, RELEASE_TAG, and MODEL_NAME from Cartesia during onboarding. If you mirror images into your own registry, use your mirror URL for IMAGE_REGISTRY.
Required
| Variable | Description |
|---|
IMAGE_REGISTRY | Container image registry URL (Cartesia registry or your mirror). |
RELEASE_TAG | Image tag for the release you are deploying (updates per release). |
MODEL_NAME | TTS model identifier for the worker image. |
CONTAINER_KEY_FILE | Path to file containing your Cartesia API key. |
GCS_SA_FILE | Path to GCS service account JSON file. |
Optional
| Variable | Default | Description |
|---|
WORKER_REPLICAS | 1 | Number of TTS worker containers. For Compose, set to your GPU count on the host. For Swarm, scale to match your GPU node count. |
WORKER_CAPACITY | 4 | Max concurrent TTS requests per worker. Lower if you run out of GPU memory. |
BUCKET_NAME | (empty) | GCS bucket for migrations/LoRAs. Leave empty to disable sync. |
CLUSTER_NAME | cartesia-compose / cartesia-swarm | Identifier for logs and metrics. |
GCS_SYNC_INTERVAL | 300 | GCS sync interval in seconds. |
USE_MIG | 0 | Set to 1 to enable MIG mode. |