Text to Speech (SSE)
Headers
Request
The ID of the model to use for the generation. See Models for available models.
The language that the given voice should speak the transcript in.
Options: English (en), French (fr), German (de), Spanish (es), Portuguese (pt), Chinese (zh), Japanese (ja), Hindi (hi), Italian (it), Korean (ko), Dutch (nl), Polish (pl), Russian (ru), Swedish (sv), Turkish (tr).
The maximum duration of the audio in seconds. You do not usually need to specify this. If the duration is not appropriate for the length of the transcript, the output audio may be truncated.
The text classifier-free guidance value for the request.
Higher values causes the model to attend more to the text but speed up the generation. Lower values reduce the speaking rate but can increase the risk of hallucinations. The default value is 3.0
. For a slower speaking rate, we recommend values between 2.0
and 3.0
. Values are supported between 1.5
and 3.0
.
This parameter is only supported for sonic-2
models.