Skip to main content
Cartesia meters model usage in credits and agent usage in agent dollars. Every subscription plan includes a monthly credit allotment. See cartesia.ai/pricing for current plans and included credits. Credits are only used by successful requests; errors will not consume credits.

At a glance

CapabilityEndpointCost
AgentsLineBilled per minute in USD, not credits
TTS/tts/bytes, /tts/sse, /tts/websocket~1 credit per character
PVC / Fine-tune TTS/tts/bytes, /tts/sse, /tts/websocket~1.5 credits per character
STT/stt, /stt/websocket, /stt/turns/websocketDepends on endpoint, model, and audio duration
PVC Fine-Tuning/fine-tunes/create1 million credits per fine-tune
Infill/infill/bytes300 credits + ~1 credit per character
Voice changer/voice-changer/bytes, /voice-changer/sse15 credits per second

Agents

Cartesia’s hosted Line voice agents are billed per minute in US dollars. This does not affect your credit balance.
FeaturePrice per minuteNotes
Agent calling$0.06Base rate for all voice agent calls
Telephony (add-on)+$0.014Additional when using a Cartesia-provided number

Text-to-speech

Standard TTS costs approximately 1 credit per character. The exact number of credits can vary slightly due to transcript pre-processing. This applies to every TTS endpoint: /tts/bytes, /tts/sse, and /tts/websocket.

TTS with a Pro Voice Clone

Generating speech with a Pro Voice Clone costs approximately 1.5 credits per character, 50% more than standard TTS, because it runs on a bespoke model fine-tuned to your data. This does not apply to Instant Voice Clones, which are billed at the standard rate.

Speech-to-text

STT pricing depends on the model and whether you use the batch or realtime endpoint. Silence is also included, even if no transcript is produced.
Endpointink-2ink-whisper
/stt/websocket3 credits per second of audio1 credit per second of audio
/stt/turns/websocket3 credits per second of audio1 credit per second of audio
/sttNot available yet1 credit per 2 seconds of audio

Pro Voice Clone Fine-Tuning

Creating a Pro Voice Clone fine-tunes a model on your data via /fine-tunes/create and costs 1,000,000 credits. You’re only charged when training succeeds. Pro Voice Clones are pinned to the base model they were trained on, so retraining on a new base model or new data costs another 1,000,000 credits.

Infill

Infill generates audio that bridges two existing clips. Each request costs a fixed 300 credits, plus the standard TTS rate applied to the infill transcript.

Voice changer

Voice changer converts input audio into a target voice. It costs 15 credits per second of input audio on both /voice-changer/bytes and /voice-changer/sse.

Check your usage

You can check your usage on the usage page and view your current balance on the subscription page. Additionally, you can use the credit usage and agent usage API to check usage programmatically. This requires creating an admin API key.