Authorizations
Headers
API version header. Must be set to the API version, e.g. '2024-06-10'.
2024-06-10, 2024-11-13, 2025-04-16 "2024-06-10"
Query Parameters
The encoding format to process the audio as. If not specified, the audio file will be decoded automatically.
Supported formats:
pcm_s16le- 16-bit signed integer PCM, little-endian (recommended for best performance)pcm_s32le- 32-bit signed integer PCM, little-endianpcm_f16le- 16-bit floating point PCM, little-endianpcm_f32le- 32-bit floating point PCM, little-endianpcm_mulaw- 8-bit μ-law encoded PCMpcm_alaw- 8-bit A-law encoded PCM
pcm_s16le, pcm_s32le, pcm_f16le, pcm_f32le, pcm_mulaw, pcm_alaw The sample rate of the audio in Hz.
Body
ID of the model to use for transcription. Use ink-whisper for the latest Cartesia Whisper model.
The language of the input audio in ISO-639-1 format. Defaults to en.
en, zh, de, es, ru, ko, fr, ja, pt, tr, pl, ca, nl, ar, sv, it, id, hi, fi, vi, he, uk, el, ms, cs, ro, da, hu, ta, no, th, ur, hr, bg, lt, la, mi, ml, cy, sk, te, fa, lv, bn, sr, az, sl, kn, et, mk, br, eu, is, hy, ne, mn, bs, kk, sq, sw, gl, mr, pa, si, km, sn, yo, so, af, oc, ka, be, tg, sd, gu, am, yi, lo, uz, fo, ht, ps, tk, nn, mt, sa, lb, my, bo, tl, mg, as, tt, ha, ln, ha, ba, jw, su, yu The timestamp granularities to populate for this transcription. Currently only word level timestamps are supported.
Response
The transcribed text.
The specified language of the input audio.
The duration of the input audio in seconds.
Word-level timestamps showing the start and end time of each word. Only included when [word] is passed into timestamp_granularities[].