Control Speed and Emotion

Learn how to control the speed and emotion of generated speech.

Speed and emotion controls are available through the playground and the API in the Text to Speech endpoints (Bytes, SSE, and WebSocket).

The effects of controls vary by voice and transcript. If you find that the controls cause artifacts in the generated speech, try reducing their strength or reducing the number of controls you have applied.

Playground

In the playground, you can access speed and emotion controls by clicking the “Speed/Emotion” button in the Text to Speech tab.

API

This feature is currently experimental and is subject to breaking changes.

To use controls in the API, add the __experimental_controls dictionary to the voice object in your API request:

1"voice": {
2 "mode": "id",
3 "id": "VOICE_ID",
4 "__experimental_controls": {
5 "speed": "normal",
6 "emotion": [
7 "positivity:high",
8 "curiosity"
9 ]
10 }
11}

Speed Options

  • "slowest": Very slow speech
  • "slow": Slower than normal speech
  • "normal": Default speech rate
  • "fast": Faster than normal speech
  • "fastest": Very fast speech

For more granular control, you can define speed as a number within the range [1.0,1.0][-1.0, 1.0]. A value of 0 represents the default speed, while negative values slow down the speech and positive values speed it up.

1"__experimental_controls": {
2 "speed": "fast"
3}

Emotion Options

The emotion parameter is an array of “tags” in the form emotion_name:level. For example, positivity:high or curiosity.

Emotion Names

  • anger
  • positivity
  • surprise
  • sadness
  • curiosity

Emotion Levels

Emotion controls are purely additive, they cannot reduce or remove emotions. For example, anger:low will add a small amount of anger to the voice, not make the voice less angry.

  • lowest
  • low
  • (omit level for moderate addition of the emotion)
  • high
  • highest