Documentation Index
Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
Use this file to discover all available pages before exploring further.
Pod Auto-Scaling (KEDA)
KEDA ScaledObjects use Prometheus-based metrics with two triggers:| Trigger | Metric | Threshold | Condition |
|---|---|---|---|
| Worker Load | inferno_worker_load / inferno_worker_capacity | 0.8 (80%) | Always active |
| Queue-based | api_queue_size / capacity (overflow mode) | 1.0 | Only when minReplicas=0 |
| Queue-based | api_unserviceable_requests_size | 0.9 | Only when minReplicas=0 |
- Polling interval: 15 seconds
- Scale-up stabilization: 30 seconds
- Scale-down stabilization: 900 seconds (15 min)
- Scale-down policy: Remove 1 pod per 60 seconds
Cluster/Node Auto-Scaling
- AWS EKS
- GCP GKE
Uses the Cluster Autoscaler:
- Scan interval: 10 seconds
- Scale-down delay: 10 minutes after node add
- Scale-down unneeded time: 10 minutes
- Expander: least-waste (bin-packing)
- Metric: Pending pods that can’t be scheduled due to insufficient resources