Cartesia’s inference cluster includes support for Prometheus, an open source metrics and monitoring solution. All metrics are scraped every 5 seconds via PodMonitor on port 8080Documentation Index
Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
Use this file to discover all available pages before exploring further.
/metrics.
Prometheus Metrics
| Metric Name | Emitted by | Description | Normal Range |
|---|---|---|---|
inferno_worker_load | Worker pods | # of concurrent chunks the worker is processing now | < Capacity |
inferno_worker_capacity | Worker pods | # of concurrent chunks a worker can process | hardware dependent |
inferno_worker_ttfa | Worker pods (TTS only) | Time to First Audio | < 200 ms |
inferno_worker_rtf | Worker pods | Real time factor | < 1 |
api_queue_size | API server pod | Request queue size per offering | Low |
api_unserviceable_requests_size | API server pod | Unserviceable requests count | 0 |