Speech-to-Text (Batch)
Headers
Query parameters
Request
The audio file to transcribe. The max file size we support is 2GB. Supported formats: flac, m4a, mp3, mp4, mpeg, mpga, oga, ogg, wav, webm.
ID of the model to use for transcription. Use ink-whisper
for the latest Cartesia Whisper model.
The timestamp granularities to populate for this transcription. Currently only word
level timestamps are supported.
Response
Word-level timestamps showing the start and end time of each word. Only included when [word]
is passed into timestamp_granularities[]
.