Compare TTS Endpoints
Learn which TTS endpoint to use for your use case.
If you want to generate speech in real-time
We recommend using our WebSocket endpoint for real-time applications for a few reasons:
- Latency: You can establish a WebSocket connection in advance, which means that you do not incur any connection latency when you start generating speech. (This usually saves you about 200ms.)
- Input Streaming: You can stream in inputs while maintaining the prosody of the generated speech, which is useful when generating text inputs in real-time, such as with an LLM.
- Timestamps: You can get timestamped transcripts for the generated speech to build features like subtitles or live transcripts. (For the sonicmodel, timestamps are only supported for languagesen,de,es, andfr. Forsonic-preview, timestamps are supported for all languages!)
- Multiplexing: You can multiplex multiple conversations over a single connection.
If you want to generate speech ahead of time
We recommend using our raw bytes (i.e. audio file) output endpoint, which can give you outputs in a variety of formats, such as WAV and MP3 (in addition to raw PCM audio).