Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt

Use this file to discover all available pages before exploring further.

When sending raw (PCM) audio for speech-to-text transcription, specify the encoding and sample rate as query parameters since they cannot be detected from the audio data itself. In general, you should match the encoding and sample rate to whatever your upstream pipeline already produces (microphone capture, telephony stream, ML model output) to avoid an extra resampling step. If your audio source is flexible, we recommend pcm_s16le at 16 kHz for streaming STT.

Reference

encoding
string
The encoding of the input audio. Available options: pcm_s16le, pcm_s32le, pcm_f16le, pcm_f32le, pcm_mulaw, pcm_alaw.
sample_rate
number
The sample rate of the input audio in Hz. Must match the actual sample rate of the audio you send.

RAW (PCM) Audio

When sending raw audio, the encoding and sample rate must match what your upstream source produces.
EncodingBit depthCommon sourcesPair with sample rate
pcm_s16le16-bit intMicrophones, browsers (Web Audio API), most audio capture libraries16000-44100
pcm_s32le32-bit intProfessional audio interfaces16000–48000
pcm_f16le16-bit floatHalf-precision ML pipelines16000-48000
pcm_f32le32-bit floatML models, Web Audio API AudioWorklet nodes, NumPy/SciPy16000-48000
pcm_mulaw8-bit compressedNorth American / Japanese telephony (G.711μ), Twilio8000
pcm_alaw8-bit compressedEuropean / international telephony (G.711A)8000

Most applications

For most applications, send 16-bit signed PCM at 16 kHz.
?encoding=pcm_s16le&sample_rate=16000

Telephony

North America and Japan

Many customers send their audio output over Twilio. All audio sent over Twilio is transcoded to µ-law encoding with an 8 kHz sample rate.
?encoding=pcm_mulaw&sample_rate=8000

Europe, India, and others

The standard for European and international telephone networks (G.711A) is 8-bit A-law compressed PCM with an 8 kHz sample rate.
?encoding=pcm_alaw&sample_rate=8000