Skip to main content
Pick the encoding that matches your downstream pipeline. If unsure, start with pcm_s16le.

TTS output encodings

Used in the output_format.encoding field when generating audio.
EncodingBit depthBest forPair with sample rate
pcm_s16le16-bit intGeneral-purpose playback, browsers, audio players, most devices44100 (CD quality) or 16000–48000
pcm_f32le32-bit floatML post-processing, high-fidelity recording, audio analysis48000
pcm_mulaw8-bit compressedNorth American / Japanese telephony (G.711μ), Twilio8000
pcm_alaw8-bit compressedEuropean / international telephony (G.711A)8000

pcm_s16le

16-bit signed integer PCM, little-endian. Matches the standard audio CD format and is the most widely supported encoding across audio players, browsers, and hardware. Use this as your default unless you have a specific reason to choose another format.
{
  "container": "raw",
  "encoding": "pcm_s16le",
  "sample_rate": 44100
}

pcm_f32le

32-bit floating point PCM, little-endian. Provides the highest precision and dynamic range. Use when your pipeline handles float audio end-to-end—for example, feeding generated audio into an ML model, performing signal processing with NumPy/SciPy, or recording to a lossless format for later mastering.
{
  "container": "raw",
  "encoding": "pcm_f32le",
  "sample_rate": 48000
}

pcm_mulaw

8-bit μ-law compressed PCM. The standard encoding for North American and Japanese telephone networks (G.711μ). Use this when sending audio to Twilio or any telephony provider that expects μ-law. Always pair with an 8000 Hz sample rate to match the telephony standard.
{
  "container": "raw",
  "encoding": "pcm_mulaw",
  "sample_rate": 8000
}

pcm_alaw

8-bit A-law compressed PCM. The standard encoding for European and international telephone networks (G.711A). Use when your telephony infrastructure expects A-law rather than μ-law. Always pair with an 8000 Hz sample rate.
{
  "container": "raw",
  "encoding": "pcm_alaw",
  "sample_rate": 8000
}

STT input encodings

Used in the encoding parameter when sending audio for transcription. Must match the actual encoding of your audio source.
EncodingBit depthCommon sources
pcm_s16le16-bit intMicrophones, browsers (Web Audio API), most audio capture libraries
pcm_s32le32-bit intProfessional audio interfaces
pcm_f16le16-bit floatHalf-precision ML pipelines
pcm_f32le32-bit floatML models, Web Audio API AudioWorklet nodes, NumPy/SciPy
pcm_mulaw8-bit compressedNorth American telephony, Twilio streams
pcm_alaw8-bit compressedEuropean telephony systems
For best STT performance, resample your audio to pcm_s16le at 16000 Hz before sending.

How to choose

1

Identify your output destination

Where does the audio end up? A browser, a phone call, an ML pipeline, a file on disk?
2

Match the encoding to the destination

  • Browser or device playbackpcm_s16le
  • ML or audio processing pipelinepcm_f32le
  • Twilio or NA/JP telephonypcm_mulaw at 8 kHz
  • European telephonypcm_alaw at 8 kHz
3

Pick the highest sample rate your pipeline supports

Higher sample rates preserve more audio detail. Use 44100 or 48000 for general playback, 16000 for Bluetooth HFP, and 8000 for telephony.