Text-to-Speech (Bytes)
Stream audio from a complete transcript
Authorizations
A short-lived access token to make API requests from a client.
Headers
API version header.
2026-03-01 "2026-03-01"
Body
- RAWOutputFormat
- WAVOutputFormat
- MP3OutputFormat
The language that the given voice should speak the transcript in. This may depend on the model you're using. See Models for details.
en, fr, de, es, pt, zh, ja, hi, it, ko, nl, pl, ru, sv, tr, tl, bg, ro, ar, cs, el, fi, hr, ms, sk, da, ta, uk, hu, no, vi, bn, th, he, ka, id, te, gu, kn, ml, mr, pa The ID of a pronunciation dictionary to use for the generation. Pronunciation dictionaries are supported by sonic-3 models and newer.
Configure the various attributes of the generated speech. Available on sonic-3 and sonic-3.5; not available on earlier models.
See Volume, Speed, and Emotion for a guide on this option.
This property is deprecated and may not work for all voices. Use generation_config.speed instead.
Influences the speed of the generated speech.
slow, normal, fast Response
Audio bytes
The response is of type file.