> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Autoscaling

## Pod Auto-Scaling (KEDA)

KEDA ScaledObjects use Prometheus-based metrics with two triggers:

| Trigger     | Metric                                            | Threshold | Condition               |
| ----------- | ------------------------------------------------- | --------- | ----------------------- |
| Worker Load | inferno\_worker\_load / inferno\_worker\_capacity | 0.8 (80%) | Always active           |
| Queue-based | api\_queue\_size / capacity (overflow mode)       | 1.0       | Only when minReplicas=0 |
| Queue-based | api\_unserviceable\_requests\_size                | 0.9       | Only when minReplicas=0 |

Scaling behavior:

* Polling interval: 15 seconds
* Scale-up stabilization: 30 seconds
* Scale-down stabilization: 900 seconds (15 min)
* Scale-down policy: Remove 1 pod per 60 seconds

## Cluster/Node Auto-Scaling

<Tabs>
  <Tab title="AWS EKS">
    Uses the Cluster Autoscaler:

    * Scan interval: 10 seconds
    * Scale-down delay: 10 minutes after node add
    * Scale-down unneeded time: 10 minutes
    * Expander: least-waste (bin-packing)
    * Metric: Pending pods that can't be scheduled due to insufficient resources
  </Tab>

  <Tab title="GCP GKE">
    Uses the Native Autoscaler:

    * Profile: BALANCED
    * Resource limits: CPU (1-128), Memory (1-512GB), nvidia-l4 GPUs (0-8)
    * Metric: Pending pods + resource utilization
  </Tab>
</Tabs>

## Metrics Used for Scaling

The autoscaling triggers above use [Prometheus metrics](/self-hosted/metrics) exposed by the application. See the [Metrics and Monitoring](/self-hosted/metrics) page for the full list of available metrics.
