Skip to main content
Docker Compose and Docker Swarm deployment are currently in beta. Connect with the Cartesia team for support.
Deploy Cartesia TTS on a single machine with Docker Compose, or across a multi-node cluster with Docker Swarm.
Docker ComposeDocker Swarm
NodesSingle hostMultiple hosts (managers + workers)
GPU scalingMultiple workers via WORKER_REPLICAS (one per GPU)Workers scheduled on labeled GPU nodes
MIG supportAuto-detected via --mig flagPer-node via node labels and --mig flag
NetworkingBridge (default)Overlay (Swarm-managed)

Prerequisites

  • One or more machines with Docker installed (your user must be in the docker group)
  • Compose only: Docker Compose V2 (docker compose)
  • Swarm only: nodes meet Docker’s Swarm networking requirements
  • At least one NVIDIA GPU with drivers installed. MIG (Multi-Instance GPU) partitioning is supported on compatible NVIDIA GPUs
  • GPU nodes have the nvidia Docker runtime set as default (see below)
  • The cartesia-kube repo downloaded as described in Downloading cartesia-kube
  • A Cartesia API key file (container_key) and a GCS service account JSON file, provided during onboarding

GPU runtime check

On each GPU node, verify the NVIDIA runtime:
nvidia-smi

docker info | grep "Default Runtime"
# Expected: Default Runtime: nvidia

docker run --rm nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi
If nvidia is not the default runtime, install the NVIDIA Container Toolkit and run:
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
sudo systemctl restart docker
If using MIG: After enabling MIG and creating instances on the host, verify they are visible:
nvidia-smi -L
# Each MIG instance appears as a MIG-... UUID line beneath its parent GPU.
# The deploy script reads these UUIDs automatically — no manual configuration required.
MIG must be enabled and instances created on the host before deploying. Recreating MIG instances generates new UUIDs; redeploy the stack if this happens.

Step 1 — Prepare secrets

Place these files on the host (Compose) or manager node (Swarm):
  • container_key — file containing your Cartesia API key
  • service-account.json — GCS service account JSON with roles/artifactregistry.reader (image pull) and roles/storage.objectViewer (GCS sync)
Make the deploy script executable:
chmod +x local/scripts/deploy-compose.sh

Step 2 — Initialize the cluster (Swarm only)

Skip this step if you are using Docker Compose. On the manager node:
docker swarm init --advertise-addr <MANAGER_IP>
Copy the docker swarm join command from the output. On each additional node, run:
docker swarm join --token <TOKEN> <MANAGER_IP>:2377
Label each node from the manager. Use docker node ls to list node IDs:
docker node update --label-add cpu=true <node-id>   # CPU services (API, NATS, etc.)
docker node update --label-add gpu=true <node-id>   # Standard GPU workers
If using MIG: Label MIG-enabled nodes with mig=true and a comma-separated list of their MIG instance UUIDs (obtained from nvidia-smi -L on that node). Do not apply gpu=true to MIG nodes.
docker node update --label-add mig=true <node-id>
docker node update --label-add 'mig.uuids=MIG-<uuid1>,MIG-<uuid2>' <node-id>
Mixed clusters with both standard GPU nodes and MIG nodes are supported — the deploy script handles scheduling for both automatically.

Step 3 — Configure environment

Set environment variables before deploying. Use a .env file in local/ (see local/.env.example) or export them in your shell.
export IMAGE_REGISTRY="YOUR_IMAGE_REGISTRY"
export RELEASE_TAG="YOUR_RELEASE_TAG"
export MODEL_NAME="YOUR_MODEL_NAME"

export CONTAINER_KEY_FILE=/path/to/cartesia-api-key
export GCS_SA_FILE=/path/to/service-account.json

# Optional
export WORKER_REPLICAS=1
export WORKER_CAPACITY=4
export BUCKET_NAME=""
export CLUSTER_NAME="cartesia-compose"   # or "cartesia-swarm"
export USE_MIG=0                         # set to 1 to enable MIG mode (or pass --mig to the deploy script)
See Configuration for full details on each variable.

Step 4 — Deploy

From the repo root:
# Standard deployment
./local/scripts/deploy-compose.sh

# With MIG support (auto-detects MIG instances via nvidia-smi)
./local/scripts/deploy-compose.sh --mig
When --mig is used, the script auto-detects MIG instance UUIDs from nvidia-smi, generates a per-slice worker configuration, and scales the standard worker to zero.
TTS workers take a few minutes to load the model into GPU memory. During this time, TTS requests will return errors even though containers appear healthy. Wait for the ready signal:
cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs -f tts-worker 2>&1 | grep -i "ready"

Step 5 — Verify

Check that services are running:
cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml ps
If deployed with MIG, verify each worker sees exactly one MIG device:
# List all running services (MIG workers appear as tts-worker-mig-0, tts-worker-mig-1, etc.)
cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml -f docker-compose.mig.generated.yaml ps
Test the API:
curl http://localhost:5000/status
Test TTS:
curl -s -X POST "http://localhost:5000/tts/bytes" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Cartesia-Version: 2024-06-10" \
  -d '{
    "model_id": "sonic-3.5",
    "transcript": "Hello from Cartesia.",
    "voice": {"mode": "id", "id": "00510a15-4216-4fdc-a0ab-05d74cd9f795"},
    "language": "en",
    "output_format": {"container": "mp3", "sample_rate": 44100, "bit_rate": 128000}
  }' --output test.mp3

Troubleshooting

cd local

docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs api
docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs tts-worker

# Restart everything
docker compose -f docker-compose.base.yaml -f docker-compose.yaml down
docker compose -f docker-compose.base.yaml -f docker-compose.yaml up -d
If the API exits with no servers available for connection (NATS not ready), restart the API after the stack is up:
cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml up -d && docker compose -f docker-compose.base.yaml -f docker-compose.yaml restart api

Configuration

Set these environment variables before running the deploy script. You receive IMAGE_REGISTRY, RELEASE_TAG, and MODEL_NAME from Cartesia during onboarding. If you mirror images into your own registry, use your mirror URL for IMAGE_REGISTRY.

Required

VariableDescription
IMAGE_REGISTRYContainer image registry URL (Cartesia registry or your mirror).
RELEASE_TAGImage tag for the release you are deploying (updates per release).
MODEL_NAMETTS model identifier for the worker image.
CONTAINER_KEY_FILEPath to file containing your Cartesia API key.
GCS_SA_FILEPath to GCS service account JSON file.

Optional

VariableDefaultDescription
WORKER_REPLICAS1Number of TTS worker containers. For Compose, set to your GPU count on the host. For Swarm, scale to match your GPU node count.
WORKER_CAPACITY4Max concurrent TTS requests per worker. Lower if you run out of GPU memory.
BUCKET_NAME(empty)GCS bucket for migrations/LoRAs. Leave empty to disable sync.
CLUSTER_NAMEcartesia-compose / cartesia-swarmIdentifier for logs and metrics.
GCS_SYNC_INTERVAL300GCS sync interval in seconds.
USE_MIG0Set to 1 to enable MIG mode.