Skip to main content

Pod Auto-Scaling (KEDA)

KEDA ScaledObjects use Prometheus-based metrics with two triggers:
TriggerMetricThresholdCondition
Worker Loadinferno_worker_load / inferno_worker_capacity0.8 (80%)Always active
Queue-basedapi_queue_size / capacity (overflow mode)1.0Only when minReplicas=0
Queue-basedapi_unserviceable_requests_size0.9Only when minReplicas=0
Scaling behavior:
  • Polling interval: 15 seconds
  • Scale-up stabilization: 30 seconds
  • Scale-down stabilization: 900 seconds (15 min)
  • Scale-down policy: Remove 1 pod per 60 seconds

Cluster/Node Auto-Scaling

Uses the Cluster Autoscaler:
  • Scan interval: 10 seconds
  • Scale-down delay: 10 minutes after node add
  • Scale-down unneeded time: 10 minutes
  • Expander: least-waste (bin-packing)
  • Metric: Pending pods that can’t be scheduled due to insufficient resources

Metrics Used for Scaling

The autoscaling triggers above use Prometheus metrics exposed by the application. See the Metrics and Monitoring page for the full list of available metrics.