Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt

Use this file to discover all available pages before exploring further.

Pod Auto-Scaling (KEDA)

KEDA ScaledObjects use Prometheus-based metrics with two triggers:
TriggerMetricThresholdCondition
Worker Loadinferno_worker_load / inferno_worker_capacity0.8 (80%)Always active
Queue-basedapi_queue_size / capacity (overflow mode)1.0Only when minReplicas=0
Queue-basedapi_unserviceable_requests_size0.9Only when minReplicas=0
Scaling behavior:
  • Polling interval: 15 seconds
  • Scale-up stabilization: 30 seconds
  • Scale-down stabilization: 900 seconds (15 min)
  • Scale-down policy: Remove 1 pod per 60 seconds

Cluster/Node Auto-Scaling

Uses the Cluster Autoscaler:
  • Scan interval: 10 seconds
  • Scale-down delay: 10 minutes after node add
  • Scale-down unneeded time: 10 minutes
  • Expander: least-waste (bin-packing)
  • Metric: Pending pods that can’t be scheduled due to insufficient resources

Metrics Used for Scaling

The autoscaling triggers above use Prometheus metrics exposed by the application. See the Metrics and Monitoring page for the full list of available metrics.