> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# STT Input Audio Encodings

When sending raw (PCM) audio for speech-to-text transcription, specify the encoding and sample rate as query parameters since they cannot be detected from the audio data itself.

In general, you should match the encoding and sample rate to whatever your upstream pipeline already produces (microphone capture, telephony stream, ML model output) to avoid an extra resampling step.

If your audio source is flexible, we recommend `pcm_s16le` at 16 kHz for streaming STT.

## Reference

<ParamField query="encoding" type="string">
  The encoding of the input audio. Available options: `pcm_s16le`, `pcm_s32le`,
  `pcm_f16le`, `pcm_f32le`, `pcm_mulaw`, `pcm_alaw`.
</ParamField>

<ParamField query="sample_rate" type="number">
  The sample rate of the input audio in Hz. Must match the actual sample rate of
  the audio you send.
</ParamField>

## RAW (PCM) Audio

When sending raw audio, the encoding and sample rate must match what your upstream source produces.

| Encoding    | Bit depth        | Common sources                                                      | Pair with sample rate |
| ----------- | ---------------- | ------------------------------------------------------------------- | --------------------- |
| `pcm_s16le` | 16-bit int       | Microphones, browsers (Web Audio API), most audio capture libraries | 16000-44100           |
| `pcm_s32le` | 32-bit int       | Professional audio interfaces                                       | 16000–48000           |
| `pcm_f16le` | 16-bit float     | Half-precision ML pipelines                                         | 16000-48000           |
| `pcm_f32le` | 32-bit float     | ML models, Web Audio API `AudioWorklet` nodes, NumPy/SciPy          | 16000-48000           |
| `pcm_mulaw` | 8-bit compressed | North American / Japanese telephony (G.711μ), Twilio                | 8000                  |
| `pcm_alaw`  | 8-bit compressed | European / international telephony (G.711A)                         | 8000                  |

### Most applications

For most applications, send 16-bit signed PCM at 16 kHz.

```
?encoding=pcm_s16le&sample_rate=16000
```

### Telephony

#### North America and Japan

Many customers send their audio output over Twilio. All audio sent over Twilio is
transcoded to µ-law encoding with an 8 kHz sample rate.

```
?encoding=pcm_mulaw&sample_rate=8000
```

#### Europe, India, and others

The standard for European and international telephone networks (G.711A) is 8-bit A-law compressed PCM with an 8 kHz sample rate.

```
?encoding=pcm_alaw&sample_rate=8000
```
