pcm_s16le.
TTS output encodings
Used in theoutput_format.encoding field when generating audio.
| Encoding | Bit depth | Best for | Pair with sample rate |
|---|---|---|---|
pcm_s16le | 16-bit int | General-purpose playback, browsers, audio players, most devices | 44100 (CD quality) or 16000–48000 |
pcm_f32le | 32-bit float | ML post-processing, high-fidelity recording, audio analysis | 48000 |
pcm_mulaw | 8-bit compressed | North American / Japanese telephony (G.711μ), Twilio | 8000 |
pcm_alaw | 8-bit compressed | European / international telephony (G.711A) | 8000 |
pcm_s16le
16-bit signed integer PCM, little-endian. Matches the standard audio CD format and is the most widely supported encoding across audio players, browsers, and hardware. Use this as your default unless you have a specific reason to choose another format.
pcm_f32le
32-bit floating point PCM, little-endian. Provides the highest precision and dynamic range. Use when your pipeline handles float audio end-to-end—for example, feeding generated audio into an ML model, performing signal processing with NumPy/SciPy, or recording to a lossless format for later mastering.
pcm_mulaw
8-bit μ-law compressed PCM. The standard encoding for North American and Japanese telephone networks (G.711μ). Use this when sending audio to Twilio or any telephony provider that expects μ-law. Always pair with an 8000 Hz sample rate to match the telephony standard.
pcm_alaw
8-bit A-law compressed PCM. The standard encoding for European and international telephone networks (G.711A). Use when your telephony infrastructure expects A-law rather than μ-law. Always pair with an 8000 Hz sample rate.
STT input encodings
Used in theencoding parameter when sending audio for transcription. Must match the actual encoding of your audio source.
| Encoding | Bit depth | Common sources |
|---|---|---|
pcm_s16le | 16-bit int | Microphones, browsers (Web Audio API), most audio capture libraries |
pcm_s32le | 32-bit int | Professional audio interfaces |
pcm_f16le | 16-bit float | Half-precision ML pipelines |
pcm_f32le | 32-bit float | ML models, Web Audio API AudioWorklet nodes, NumPy/SciPy |
pcm_mulaw | 8-bit compressed | North American telephony, Twilio streams |
pcm_alaw | 8-bit compressed | European telephony systems |
pcm_s16le at 16000 Hz before sending.
How to choose
Identify your output destination
Where does the audio end up? A browser, a phone call, an ML pipeline, a file on disk?
Match the encoding to the destination
- Browser or device playback →
pcm_s16le - ML or audio processing pipeline →
pcm_f32le - Twilio or NA/JP telephony →
pcm_mulawat 8 kHz - European telephony →
pcm_alawat 8 kHz