Skip to main content
Try out Ink 2, which provides improved turn detection for English voice agents.

ink-whisper

Ink Whisper is our most affordable speech-to-text model. It delivers higher accuracy and lower latency than baseline Whisper. Additional capabilities compared to baseline Whisper:
  • Handles variable-length audio chunks and interruptions gracefully using dynamic chunking.
  • Reliably transcribes speech with background noise.
  • Accurately transcribes audio with telephony artifacts, accents, and disfluencies.
  • Excels at transcribing proper nouns and domain-specific terminology.
SnapshotRelease DateLanguagesStatus
ink-whisper-2025-06-04June 4, 2025en, zh, de, es, ru, ko, fr, ja, pt, tr, pl, ca, nl, ar, sv, it, id, hi, fi, vi, he, uk, el, ms, cs, ro, da, hu, ta, no, th, ur, hr, bg, lt, la, mi, ml, cy, sk, te, fa, lv, bn, sr, az, sl, kn, et, mk, br, eu, is, hy, ne, mn, bs, kk, sq, sw, gl, mr, pa, si, km, sn, yo, so, af, oc, ka, be, tg, sd, gu, am, yi, lo, uz, fo, ht, ps, tk, nn, mt, sa, lb, my, bo, tl, mg, as, tt, haw, ln, ha, ba, jw, su, yueStable
To learn how to use the Ink STT family, see Compare Endpoints.