Docker

Docker Compose and Docker Swarm deployment are currently in beta. Connect with the Cartesia team for support.

Deploy Cartesia TTS on a single machine with Docker Compose, or across a multi-node cluster with Docker Swarm.

	Docker Compose	Docker Swarm
Nodes	Single host	Multiple hosts (managers + workers)
GPU scaling	Multiple workers via `WORKER_REPLICAS` (one per GPU)	Workers scheduled on labeled GPU nodes
MIG support	Auto-detected via `--mig` flag	Per-node via node labels and `--mig` flag
Networking	Bridge (default)	Overlay (Swarm-managed)

Prerequisites

One or more machines with Docker installed (your user must be in the docker group)
Compose only: Docker Compose V2 (docker compose)
Swarm only: nodes meet Docker’s Swarm networking requirements
At least one NVIDIA GPU with drivers installed. MIG (Multi-Instance GPU) partitioning is supported on compatible NVIDIA GPUs
GPU nodes have the nvidia Docker runtime set as default (see below)
The cartesia-kube repo downloaded as described in Downloading cartesia-kube
A Cartesia API key file (container_key) and a GCS service account JSON file, provided during onboarding

GPU runtime check

On each GPU node, verify the NVIDIA runtime:

nvidia-smi

docker info | grep "Default Runtime"
# Expected: Default Runtime: nvidia

docker run --rm nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi

If nvidia is not the default runtime, install the NVIDIA Container Toolkit and run:

sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
sudo systemctl restart docker

If using MIG: After enabling MIG and creating instances on the host, verify they are visible:

nvidia-smi -L
# Each MIG instance appears as a MIG-... UUID line beneath its parent GPU.
# The deploy script reads these UUIDs automatically — no manual configuration required.

MIG must be enabled and instances created on the host before deploying. Recreating MIG instances generates new UUIDs; redeploy the stack if this happens.

Step 1 — Prepare secrets

Place these files on the host (Compose) or manager node (Swarm):

container_key — file containing your Cartesia API key
service-account.json — GCS service account JSON with roles/artifactregistry.reader (image pull) and roles/storage.objectViewer (GCS sync)

Make the deploy script executable:

Compose
Swarm

chmod +x local/scripts/deploy-compose.sh

chmod +x local/scripts/deploy-swarm.sh

Step 2 — Initialize the cluster (Swarm only)

Skip this step if you are using Docker Compose. On the manager node:

docker swarm init --advertise-addr <MANAGER_IP>

Copy the docker swarm join command from the output. On each additional node, run:

docker swarm join --token <TOKEN> <MANAGER_IP>:2377

Label each node from the manager. Use docker node ls to list node IDs:

docker node update --label-add cpu=true <node-id>   # CPU services (API, NATS, etc.)
docker node update --label-add gpu=true <node-id>   # Standard GPU workers

If using MIG: Label MIG-enabled nodes with mig=true and a comma-separated list of their MIG instance UUIDs (obtained from nvidia-smi -L on that node). Do not apply gpu=true to MIG nodes.

docker node update --label-add mig=true <node-id>
docker node update --label-add 'mig.uuids=MIG-<uuid1>,MIG-<uuid2>' <node-id>

Mixed clusters with both standard GPU nodes and MIG nodes are supported — the deploy script handles scheduling for both automatically.

Step 3 — Configure environment

Set environment variables before deploying. Use a .env file in local/ (see local/.env.example) or export them in your shell.

export IMAGE_REGISTRY="YOUR_IMAGE_REGISTRY"
export RELEASE_TAG="YOUR_RELEASE_TAG"
export MODEL_NAME="YOUR_MODEL_NAME"

export CONTAINER_KEY_FILE=/path/to/cartesia-api-key
export GCS_SA_FILE=/path/to/service-account.json

# Optional
export WORKER_REPLICAS=1
export WORKER_CAPACITY=4
export BUCKET_NAME=""
export CLUSTER_NAME="cartesia-compose"   # or "cartesia-swarm"
export USE_MIG=0                         # set to 1 to enable MIG mode (or pass --mig to the deploy script)

See Configuration for full details on each variable.

Step 4 — Deploy

Compose
Swarm

From the repo root:

# Standard deployment
./local/scripts/deploy-compose.sh

# With MIG support (auto-detects MIG instances via nvidia-smi)
./local/scripts/deploy-compose.sh --mig

When --mig is used, the script auto-detects MIG instance UUIDs from nvidia-smi, generates a per-slice worker configuration, and scales the standard worker to zero.

On the manager node:

# Standard deployment
./local/scripts/deploy-swarm.sh

# With MIG support (reads UUIDs from node labels)
./local/scripts/deploy-swarm.sh --mig

This will:

Verify that nodes are labeled (fails with instructions if not).
Create encrypted Swarm secrets from your key and service account files.
Deploy all services. With --mig, one dedicated worker service is created per MIG instance, each pinned to its node.

TTS workers take a few minutes to load the model into GPU memory. During this time, TTS requests will return errors even though containers appear healthy. Wait for the ready signal:

Compose
Swarm

cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs -f tts-worker 2>&1 | grep -i "ready"

docker service logs cartesia_tts-worker -f 2>&1 | grep -i "ready"

Step 5 — Verify

Check that services are running:

Compose
Swarm

cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml ps

If deployed with MIG, verify each worker sees exactly one MIG device:

# List all running services (MIG workers appear as tts-worker-mig-0, tts-worker-mig-1, etc.)
cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml -f docker-compose.mig.generated.yaml ps

docker stack services cartesia

If deployed with MIG, verify MIG worker services are scheduled and running:

docker stack ps cartesia --filter 'name=cartesia_tts-worker-mig'

Test the API:

curl http://localhost:5000/status

Test TTS:

curl -s -X POST "http://localhost:5000/tts/bytes" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Cartesia-Version: 2024-06-10" \
  -d '{
    "model_id": "sonic-3.5",
    "transcript": "Hello from Cartesia.",
    "voice": {"mode": "id", "id": "00510a15-4216-4fdc-a0ab-05d74cd9f795"},
    "language": "en",
    "output_format": {"container": "mp3", "sample_rate": 44100, "bit_rate": 128000}
  }' --output test.mp3

Troubleshooting

Compose
Swarm

cd local

docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs api
docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs tts-worker

# Restart everything
docker compose -f docker-compose.base.yaml -f docker-compose.yaml down
docker compose -f docker-compose.base.yaml -f docker-compose.yaml up -d

If the API exits with no servers available for connection (NATS not ready), restart the API after the stack is up:

cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml up -d && docker compose -f docker-compose.base.yaml -f docker-compose.yaml restart api

docker stack ps cartesia --no-trunc

docker service logs cartesia_api
docker service logs cartesia_tts-worker

# Restart the stack
docker stack rm cartesia
sleep 10
cd local && docker stack deploy --with-registry-auth -c docker-compose.base.yaml -c docker-compose.swarm.yaml cartesia

Configuration

Set these environment variables before running the deploy script. You receive IMAGE_REGISTRY, RELEASE_TAG, and MODEL_NAME from Cartesia during onboarding. If you mirror images into your own registry, use your mirror URL for IMAGE_REGISTRY.

Required

Variable	Description
`IMAGE_REGISTRY`	Container image registry URL (Cartesia registry or your mirror).
`RELEASE_TAG`	Image tag for the release you are deploying (updates per release).
`MODEL_NAME`	TTS model identifier for the worker image.
`CONTAINER_KEY_FILE`	Path to file containing your Cartesia API key.
`GCS_SA_FILE`	Path to GCS service account JSON file.

Optional

Variable	Default	Description
`WORKER_REPLICAS`	`1`	Number of TTS worker containers. For Compose, set to your GPU count on the host. For Swarm, scale to match your GPU node count.
`WORKER_CAPACITY`	`4`	Max concurrent TTS requests per worker. Lower if you run out of GPU memory.
`BUCKET_NAME`	(empty)	GCS bucket for migrations/LoRAs. Leave empty to disable sync.
`CLUSTER_NAME`	`cartesia-compose` / `cartesia-swarm`	Identifier for logs and metrics.
`GCS_SYNC_INTERVAL`	`300`	GCS sync interval in seconds.
`USE_MIG`	`0`	Set to `1` to enable MIG mode.

Overview

Deployments

Guides

Performance

Prerequisites

GPU runtime check

Step 1 — Prepare secrets

Step 2 — Initialize the cluster (Swarm only)

Step 3 — Configure environment

Step 4 — Deploy

Step 5 — Verify

Troubleshooting

Configuration

Required

Optional

Overview

Deployments

Guides

Performance

Documentation Index

​Prerequisites

​GPU runtime check

​Step 1 — Prepare secrets

​Step 2 — Initialize the cluster (Swarm only)

​Step 3 — Configure environment

​Step 4 — Deploy

​Step 5 — Verify

​Troubleshooting

​Configuration

​Required

​Optional

Prerequisites

GPU runtime check

Step 1 — Prepare secrets

Step 2 — Initialize the cluster (Swarm only)

Step 3 — Configure environment

Step 4 — Deploy

Step 5 — Verify

Troubleshooting

Configuration

Required

Optional