Pricing - Cartesia Docs

Cartesia meters model usage in credits and agent usage in agent dollars. Every subscription plan includes a monthly credit allotment. See cartesia.ai/pricing for current plans and included credits. Credits are only used by successful requests; errors will not consume credits.

At a glance

Capability	Endpoint	Cost
Agents	Line	Billed per minute in USD, not credits
TTS	`/tts/bytes`, `/tts/sse`, `/tts/websocket`	~1 credit per character
PVC / Fine-tune TTS	`/tts/bytes`, `/tts/sse`, `/tts/websocket`	~1.5 credits per character
STT	`/stt`, `/stt/websocket`, `/stt/turns/websocket`	Depends on endpoint, model, and audio duration
PVC Fine-Tuning	`/fine-tunes/create`	1 million credits per fine-tune
Infill	`/infill/bytes`	300 credits + ~1 credit per character
Voice changer	`/voice-changer/bytes`, `/voice-changer/sse`	15 credits per second

Agents

Cartesia’s hosted Line voice agents are billed per minute in US dollars. This does not affect your credit balance.

Feature	Price per minute	Notes
Agent calling	$0.06	Base rate for all voice agent calls
Telephony (add-on)	+$0.014	Additional when using a Cartesia-provided number

Text-to-speech

Standard TTS costs approximately 1 credit per character. The exact number of credits can vary slightly due to transcript pre-processing. This applies to every TTS endpoint: /tts/bytes, /tts/sse, and /tts/websocket.

TTS with a Pro Voice Clone

Generating speech with a Pro Voice Clone costs approximately 1.5 credits per character, 50% more than standard TTS, because it runs on a bespoke model fine-tuned to your data. This does not apply to Instant Voice Clones, which are billed at the standard rate.

Speech-to-text

STT pricing depends on the model and whether you use the batch or realtime endpoint. Silence is also included, even if no transcript is produced.

Endpoint	`ink-2`	`ink-whisper`
`/stt/websocket`	3 credits per second of audio	1 credit per second of audio
`/stt/turns/websocket`	3 credits per second of audio	1 credit per second of audio
`/stt`	Not available yet	1 credit per 2 seconds of audio

Pro Voice Clone Fine-Tuning

Creating a Pro Voice Clone fine-tunes a model on your data via /fine-tunes/create and costs 1,000,000 credits. You’re only charged when training succeeds. Pro Voice Clones are pinned to the base model they were trained on, so retraining on a new base model or new data costs another 1,000,000 credits.

Infill

Infill generates audio that bridges two existing clips. Each request costs a fixed 300 credits, plus the standard TTS rate applied to the infill transcript.

Voice changer

Voice changer converts input audio into a target voice. It costs 15 credits per second of input audio on both /voice-changer/bytes and /voice-changer/sse.

Check your usage

You can check your usage on the usage page and view your current balance on the subscription page. Additionally, you can use the credit usage and agent usage API to check usage programmatically. This requires creating an admin API key.

​At a glance

​Agents

​Text-to-speech

​TTS with a Pro Voice Clone

​Speech-to-text

​Pro Voice Clone Fine-Tuning

​Infill

​Voice changer

​Check your usage