Older Models - Cartesia Docs

Try out Ink 2, which provides improved turn detection for English voice agents.

`ink-whisper`

Ink Whisper is our most affordable speech-to-text model. It delivers higher accuracy and lower latency than baseline Whisper. Additional capabilities compared to baseline Whisper:

Handles variable-length audio chunks and interruptions gracefully using dynamic chunking.
Reliably transcribes speech with background noise.
Accurately transcribes audio with telephony artifacts, accents, and disfluencies.
Excels at transcribing proper nouns and domain-specific terminology.

Snapshot	Release Date	Languages	Status
`ink-whisper-2025-06-04`	June 4, 2025	`en`, `zh`, `de`, `es`, `ru`, `ko`, `fr`, `ja`, `pt`, `tr`, `pl`, `ca`, `nl`, `ar`, `sv`, `it`, `id`, `hi`, `fi`, `vi`, `he`, `uk`, `el`, `ms`, `cs`, `ro`, `da`, `hu`, `ta`, `no`, `th`, `ur`, `hr`, `bg`, `lt`, `la`, `mi`, `ml`, `cy`, `sk`, `te`, `fa`, `lv`, `bn`, `sr`, `az`, `sl`, `kn`, `et`, `mk`, `br`, `eu`, `is`, `hy`, `ne`, `mn`, `bs`, `kk`, `sq`, `sw`, `gl`, `mr`, `pa`, `si`, `km`, `sn`, `yo`, `so`, `af`, `oc`, `ka`, `be`, `tg`, `sd`, `gu`, `am`, `yi`, `lo`, `uz`, `fo`, `ht`, `ps`, `tk`, `nn`, `mt`, `sa`, `lb`, `my`, `bo`, `tl`, `mg`, `as`, `tt`, `haw`, `ln`, `ha`, `ba`, `jw`, `su`, `yue`	Stable

To learn how to use the Ink STT family, see Compare Endpoints.

​ink-whisper

`ink-whisper`