Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt

Use this file to discover all available pages before exploring further.

Docker Compose and Docker Swarm deployment are currently in beta. Connect with the Cartesia team for support.
Deploy Cartesia TTS on a single machine with Docker Compose, or across a multi-node cluster with Docker Swarm.
Docker ComposeDocker Swarm
NodesSingle hostMultiple hosts (managers + workers)
GPU scalingMultiple workers via WORKER_REPLICAS (one per GPU)Workers scheduled on labeled GPU nodes
MIG supportAuto-detected via --mig flagPer-node via node labels and --mig flag
NetworkingBridge (default)Overlay (Swarm-managed)

Prerequisites

  • One or more machines with Docker installed (your user must be in the docker group)
  • Compose only: Docker Compose V2 (docker compose)
  • Swarm only: nodes meet Docker’s Swarm networking requirements
  • At least one NVIDIA GPU with drivers installed. MIG (Multi-Instance GPU) partitioning is supported on compatible NVIDIA GPUs
  • GPU nodes have the nvidia Docker runtime set as default (see below)
  • The cartesia-kube repo downloaded as described in Downloading cartesia-kube
  • A Cartesia API key file (container_key) and a GCS service account JSON file, provided during onboarding

GPU runtime check

On each GPU node, verify the NVIDIA runtime:
nvidia-smi

docker info | grep "Default Runtime"
# Expected: Default Runtime: nvidia

docker run --rm nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi
If nvidia is not the default runtime, install the NVIDIA Container Toolkit and run:
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
sudo systemctl restart docker
If using MIG: After enabling MIG and creating instances on the host, verify they are visible:
nvidia-smi -L
# Each MIG instance appears as a MIG-... UUID line beneath its parent GPU.
# The deploy script reads these UUIDs automatically — no manual configuration required.
MIG must be enabled and instances created on the host before deploying. Recreating MIG instances generates new UUIDs; redeploy the stack if this happens.

Step 1 — Prepare secrets

Place these files on the host (Compose) or manager node (Swarm):
  • container_key — file containing your Cartesia API key
  • service-account.json — GCS service account JSON with roles/artifactregistry.reader (image pull) and roles/storage.objectViewer (GCS sync)
Make the deploy script executable:
chmod +x local/scripts/deploy-compose.sh

Step 2 — Initialize the cluster (Swarm only)

Skip this step if you are using Docker Compose. On the manager node:
docker swarm init --advertise-addr <MANAGER_IP>
Copy the docker swarm join command from the output. On each additional node, run:
docker swarm join --token <TOKEN> <MANAGER_IP>:2377
Label each node from the manager. Use docker node ls to list node IDs:
docker node update --label-add cpu=true <node-id>   # CPU services (API, NATS, etc.)
docker node update --label-add gpu=true <node-id>   # Standard GPU workers
If using MIG: Label MIG-enabled nodes with mig=true and a comma-separated list of their MIG instance UUIDs (obtained from nvidia-smi -L on that node). Do not apply gpu=true to MIG nodes.
docker node update --label-add mig=true <node-id>
docker node update --label-add 'mig.uuids=MIG-<uuid1>,MIG-<uuid2>' <node-id>
Mixed clusters with both standard GPU nodes and MIG nodes are supported — the deploy script handles scheduling for both automatically.

Step 3 — Configure environment

Set environment variables before deploying. Use a .env file in local/ (see local/.env.example) or export them in your shell.
export IMAGE_REGISTRY="YOUR_IMAGE_REGISTRY"
export RELEASE_TAG="YOUR_RELEASE_TAG"
export MODEL_NAME="YOUR_MODEL_NAME"

export CONTAINER_KEY_FILE=/path/to/cartesia-api-key
export GCS_SA_FILE=/path/to/service-account.json

# Optional
export WORKER_REPLICAS=1
export WORKER_CAPACITY=4
export BUCKET_NAME=""
export CLUSTER_NAME="cartesia-compose"   # or "cartesia-swarm"
export USE_MIG=0                         # set to 1 to enable MIG mode (or pass --mig to the deploy script)
See Configuration for full details on each variable.

Step 4 — Deploy

From the repo root:
# Standard deployment
./local/scripts/deploy-compose.sh

# With MIG support (auto-detects MIG instances via nvidia-smi)
./local/scripts/deploy-compose.sh --mig
When --mig is used, the script auto-detects MIG instance UUIDs from nvidia-smi, generates a per-slice worker configuration, and scales the standard worker to zero.
TTS workers take a few minutes to load the model into GPU memory. During this time, TTS requests will return errors even though containers appear healthy. Wait for the ready signal:
cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs -f tts-worker 2>&1 | grep -i "ready"

Step 5 — Verify

Check that services are running:
cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml ps
If deployed with MIG, verify each worker sees exactly one MIG device:
# List all running services (MIG workers appear as tts-worker-mig-0, tts-worker-mig-1, etc.)
cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml -f docker-compose.mig.generated.yaml ps
Test the API:
curl http://localhost:5000/status
Test TTS:
curl -s -X POST "http://localhost:5000/tts/bytes" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Cartesia-Version: 2024-06-10" \
  -d '{
    "model_id": "sonic-3",
    "transcript": "Hello from Cartesia.",
    "voice": {"mode": "id", "id": "00510a15-4216-4fdc-a0ab-05d74cd9f795"},
    "language": "en",
    "output_format": {"container": "mp3", "sample_rate": 44100, "bit_rate": 128000}
  }' --output test.mp3

Troubleshooting

cd local

docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs api
docker compose -f docker-compose.base.yaml -f docker-compose.yaml logs tts-worker

# Restart everything
docker compose -f docker-compose.base.yaml -f docker-compose.yaml down
docker compose -f docker-compose.base.yaml -f docker-compose.yaml up -d
If the API exits with no servers available for connection (NATS not ready), restart the API after the stack is up:
cd local && docker compose -f docker-compose.base.yaml -f docker-compose.yaml up -d && docker compose -f docker-compose.base.yaml -f docker-compose.yaml restart api

Configuration

Set these environment variables before running the deploy script. You receive IMAGE_REGISTRY, RELEASE_TAG, and MODEL_NAME from Cartesia during onboarding. If you mirror images into your own registry, use your mirror URL for IMAGE_REGISTRY.

Required

VariableDescription
IMAGE_REGISTRYContainer image registry URL (Cartesia registry or your mirror).
RELEASE_TAGImage tag for the release you are deploying (updates per release).
MODEL_NAMETTS model identifier for the worker image.
CONTAINER_KEY_FILEPath to file containing your Cartesia API key.
GCS_SA_FILEPath to GCS service account JSON file.

Optional

VariableDefaultDescription
WORKER_REPLICAS1Number of TTS worker containers. For Compose, set to your GPU count on the host. For Swarm, scale to match your GPU node count.
WORKER_CAPACITY4Max concurrent TTS requests per worker. Lower if you run out of GPU memory.
BUCKET_NAME(empty)GCS bucket for migrations/LoRAs. Leave empty to disable sync.
CLUSTER_NAMEcartesia-compose / cartesia-swarmIdentifier for logs and metrics.
GCS_SYNC_INTERVAL300GCS sync interval in seconds.
USE_MIG0Set to 1 to enable MIG mode.