> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Choosing TTS parameters

Our Text-to-Speech API includes many parameters that can be bewildering to developers who have not
worked with audio before.

In general, you should try to use the same encoding and sample rate across your entire audio
pipeline, including telephony and device outputs.

If you're saving audio samples, we recommend using the [Text-to-Speech (Bytes)](/api-reference/tts/bytes) API
with `output_format.container: "wav"` or `output_format.container: "mp3"` so audio players can automatically detect the encoding and sample rate.

## Reference

<ParamField path="output_format.container" type="string">
  The container format for the audio output.

  Available options: `RAW`, `WAV`, `MP3`. Only the Bytes endpoint supports all container formats;
  our other endpoints (SSE, Websockets) only support `RAW`.
</ParamField>

<ParamField path="output_format.encoding" type="string">
  The encoding of the output audio. Available options: `pcm_f32le`, `pcm_s16le`,
  `pcm_mulaw`, `pcm_alaw`.
</ParamField>

<ParamField path="output_format.sample_rate" type="number">
  The sample rate of the output audio. Remember that to represent a given signal, the sample rate
  must be at least twice the highest frequency component of the signal (Nyquist theorem).

  Available options: `8000`, `16000`, `22050`, `24000`, `44100`, `48000`.
</ParamField>

## `output_format` for RAW (PCM) Audio

When using raw audio, it is important to match the encoding and sample rate with your output device with the `output_format` parameter.

| Encoding    | Bit depth        | Commonly used for                                               | Pair with sample rate |
| ----------- | ---------------- | --------------------------------------------------------------- | --------------------- |
| `pcm_s16le` | 16-bit int       | General-purpose playback, browsers, audio players, most devices | 16000-44100           |
| `pcm_f32le` | 32-bit float     | ML post-processing, high-fidelity recording, audio analysis     | 48000                 |
| `pcm_mulaw` | 8-bit compressed | North American / Japanese telephony (G.711μ), Twilio            | 8000                  |
| `pcm_alaw`  | 8-bit compressed | European / international telephony (G.711A)                     | 8000                  |

### Audio CD quality

Standard audio CDs are encoded as `pcm_s16le` at 44.1 kHz sample rate.

```json theme={null}
{
  "container": "raw",
  "encoding": "pcm_s16le",
  "sample_rate": 44100
}
```

This performs well for consumer digital audio setups.

### Telephony

#### North America and Japan

Many customers send their audio output over Twilio. All audio sent over Twilio is
transcoded to µ-law encoding with an 8 kHz sample rate.

```json theme={null}
{
  "container": "raw",
  "encoding": "pcm_mulaw",
  "sample_rate": 8000
}
```

#### Europe, India, and others

The standard for European and international telephone networks (G.711A) is 8-bit A-law compressed PCM with an 8 kHz sample rate.

```json theme={null}
{
  "container": "raw",
  "encoding": "pcm_alaw",
  "sample_rate": 8000
}
```

### Bluetooth headsets

If you happen to know that that the user is using a Bluetooth headset (such as AirPods) to multiplex
both microphone input and headphone output, the user will be on the Bluetooth Hands-Free Profile
(HFP), limiting sample rate to 16 kHz. (In practice, it's difficult to programmatically determine the
end-user's microphone/speaker devices, so this example is a bit contrived.)

```json theme={null}
{
  "container": "raw",
  "encoding": "pcm_s16le",
  "sample_rate": 16000
}
```
