Metrics and Monitoring

Prometheus Metrics

Cartesia’s inference cluster includes support for Prometheus, an open source metrics and monitoring solution. All metrics are scraped every 5 seconds via PodMonitor on port 8080 /metrics.

Prometheus Metrics

Metric Name	Emitted by	Description	Normal Range
`inferno_worker_load`	Worker pods	# of concurrent chunks the worker is processing now	< Capacity
`inferno_worker_capacity`	Worker pods	# of concurrent chunks a worker can process	hardware dependent
`inferno_worker_ttfa`	Worker pods (TTS only)	Time to First Audio	< 200 ms
`inferno_worker_rtf`	Worker pods	Real time factor	< 1
`api_queue_size`	API server pod	Request queue size per offering	Low
`api_unserviceable_requests_size`	API server pod	Unserviceable requests count	0

Autoscaling

⌘I

Overview

Deployments

Guides

Performance

Prometheus Metrics

Overview

Deployments

Guides

Performance

Documentation Index

​Prometheus Metrics

Prometheus Metrics