API Reference
The encoding of the input audio. Available options:
pcm_s16le, pcm_s32le,
pcm_f16le, pcm_f32le, pcm_mulaw, pcm_alaw.The sample rate of the input audio in Hz. Must match the actual sample rate of
the audio you send.
Unlike realtime endpoints, batch STT also accepts containerized audio (e.g.
wav, mp3).You should only supply the encoding and sample_rate query parameters when using raw PCM audio.Cheat sheet
When sending raw audio, the encoding and sample rate must match what your upstream source produces. Here’s a quick rule-of-thumb to get started:| Encoding | Bit depth | Common sources | Pair with sample rate |
|---|---|---|---|
pcm_s16le | 16-bit int | Voice agent platforms, WAV files, most audio capture libraries | 8000–48000 |
pcm_s32le | 32-bit int | Professional audio interfaces and DAWs | 44100–48000 |
pcm_f16le | 16-bit float | Uncommon; some half-precision ML pipelines | 16000–48000 |
pcm_f32le | 32-bit float | Browsers (Web Audio API), ML models (PyTorch, NumPy/SciPy) | 16000–48000 |
pcm_mulaw | 8-bit compressed | North American / Japanese telephony (G.711μ), Twilio | 8000 |
pcm_alaw | 8-bit compressed | European / international telephony (G.711A) | 8000 |
Telephony
North America and Japan
Many customers send their audio output over Twilio. All audio sent over Twilio is transcoded to µ-law encoding with an 8 kHz sample rate.Europe, India, and others
The standard for European and international telephone networks (G.711A) is 8-bit A-law compressed PCM with an 8 kHz sample rate.Voice agent platforms
Many voice agent platforms usepcm_s16le at a 16 kHz sample rate in their pipeline. You should double check with your specific platform.
Web browsers
When capturing microphone audio through the Web Audio API, the samples arepcm_f32le. An AudioContext—and the AudioWorklet nodes you read frames from—always produces 32-bit float.
The capture sample rate defaults to whatever the user’s input hardware reports, commonly 48 kHz but sometimes 44.1 kHz. Read it from AudioContext.sampleRate and send the same value:
pcm_s16le at 16 kHz before you send cuts bandwidth with negligible impact on accuracy.
Double check your parameters
The model decodes your bytes using theencoding and sample_rate you declared in the connection. Our server might not error if these parameters are incorrect.
You can validate your parameters by saving your audio data and playing it back with ffplay:
encoding or sample_rate doesn’t match the data. Correct it so your audio plays back cleanly, then send those same values to the API.