Control Speed and Emotion
Learn how to control the speed and emotion of generated speech.
Speed and emotion controls are available through the playground and the API in the Text to Speech endpoints (Bytes, SSE, and WebSocket).
The effects of controls vary by voice and transcript. If you find that the controls cause artifacts in the generated speech, try reducing their strength or reducing the number of controls you have applied.
Playground
In the playground, you can access speed and emotion controls by clicking the “Speed/Emotion” button in the Text to Speech tab.
API
This feature is currently experimental and is subject to breaking changes.
To use controls in the API, add the __experimental_controls
dictionary to the voice
object in your API request:
Speed Options
"slowest"
: Very slow speech"slow"
: Slower than normal speech"normal"
: Default speech rate"fast"
: Faster than normal speech"fastest"
: Very fast speech
For more granular control, you can define speed as a number within the range . A value of 0 represents the default speed, while negative values slow down the speech and positive values speed it up.
Using a label
Using a number
Emotion Options
The emotion
parameter is an array of “tags” in the form emotion_name:level
. For example, positivity:high
or curiosity
.
Emotion Names
anger
positivity
surprise
sadness
curiosity
Emotion Levels
Emotion controls are purely additive, they cannot reduce or remove emotions. For example, anger:low
will add a small amount of anger to the voice, not make the voice less angry.
lowest
low
- (omit level for moderate addition of the emotion)
high
highest