STT | Cartesia

Ink is a new family of streaming speech-to-text (STT) models for developers building real-time voice applications.

● the latest stable snapshot of the model

To use the stable version of the model, we recommend using the base model name (e.g. ink-whisper). In many cases the stable and preview snapshots are the same, but in some cases the preview snapshot may have additional features or improvements.

`ink-whisper`

Ink Whisper is the fastest, most affordable speech-to-text model — engineered for enterprise deployment in production-grade voice agents. It delivers higher accuracy than baseline Whisper and is optimized for real-time performance in a wide variety of real-world conditions.

Additional Capabilities:

Handles variable-length audio chunks and interruptions gracefully using dynamic chunking.
Reliably transcribes speech with background noise.
Accurately transcribes audio with telephony artifacts, accents, and disfluencies.
Excels at transcribing proper nouns and domain-specific terminology.

Snapshot	Release Date	Languages	Status
● `ink-whisper`	June 10, 2025	`en`, `zh`, `de`, `es`, `ru`, `ko`, `fr`, `ja`, `pt`, `tr`, `pl`, `ca`, `nl`, `ar`, `sv`, `it`, `id`, `hi`, `fi`, `vi`, `he`, `uk`, `el`, `ms`, `cs`, `ro`, `da`, `hu`, `ta`, `no`, `th`, `ur`, `hr`, `bg`, `lt`, `la`, `mi`, `ml`, `cy`, `sk`, `te`, `fa`, `lv`, `bn`, `sr`, `az`, `sl`, `kn`, `et`, `mk`, `br`, `eu`, `is`, `hy`, `ne`, `mn`, `bs`, `kk`, `sq`, `sw`, `gl`, `mr`, `pa`, `si`, `km`, `sn`, `yo`, `so`, `af`, `oc`, `ka`, `be`, `tg`, `sd`, `gu`, `am`, `yi`, `lo`, `uz`, `fo`, `ht`, `ps`, `tk`, `nn`, `mt`, `sa`, `lb`, `my`, `bo`, `tl`, `mg`, `as`, `tt`, `haw`, `ln`, `ha`, `ba`, `jw`, `su`, `yue`	Stable
`ink-whisper-2025-06-04`	June 4, 2025	`en`, `zh`, `de`, `es`, `ru`, `ko`, `fr`, `ja`, `pt`, `tr`, `pl`, `ca`, `nl`, `ar`, `sv`, `it`, `id`, `hi`, `fi`, `vi`, `he`, `uk`, `el`, `ms`, `cs`, `ro`, `da`, `hu`, `ta`, `no`, `th`, `ur`, `hr`, `bg`, `lt`, `la`, `mi`, `ml`, `cy`, `sk`, `te`, `fa`, `lv`, `bn`, `sr`, `az`, `sl`, `kn`, `et`, `mk`, `br`, `eu`, `is`, `hy`, `ne`, `mn`, `bs`, `kk`, `sq`, `sw`, `gl`, `mr`, `pa`, `si`, `km`, `sn`, `yo`, `so`, `af`, `oc`, `ka`, `be`, `tg`, `sd`, `gu`, `am`, `yi`, `lo`, `uz`, `fo`, `ht`, `ps`, `tk`, `nn`, `mt`, `sa`, `lb`, `my`, `bo`, `tl`, `mg`, `as`, `tt`, `haw`, `ln`, `ha`, `ba`, `jw`, `su`, `yue`	Stable

To learn how to use the Ink STT family, see the Speech-to-Text API Reference. You can find a detailed mapping of codes to languages, see the STT supported languages list.

Selecting a Model

When making API calls, you can specify either:

1 // Use the base model (automatically routes to the latest snapshot)
2 {
3   model = "ink-whisper",
4   ...
5 }
6 
7 // Or specify a particular snapshot for consistency
8 {
9   model = "ink-whisper-2025-06-04",
10   ...
11 }

Continuous updates

All models have a base model name (e.g. ink-whisper). We recommend using these for prototyping and development, then switching to a date-versioned model for production use cases to ensure stability.

Future Updates

New snapshots are released periodically with improvements in performance, additional language support, and new capabilities. Check back regularly for updates.