Controlling Speed & Emotion

How to control the speed and emotion of voices while generating audio with Sonic.

Controlling Voices through Speed & Emotion

The __experimental_controls parameter allows you to fine-tune the voice output by adjusting speed and emotion. This feature is currently experimental and may be subject to changes in future API versions.

Usage

Add the __experimental_controls dictionary to the voice object in your API request:

1"voice": {
2 "mode": "id",
3 "id": "VOICE_ID",
4 "__experimental_controls": {
5 "speed": "normal",
6 "emotion": [
7 "positivity:high",
8 "curiosity"
9 ]
10 }
11}

Speed Control

The speed parameter adjusts the pace of speech. Available options are:

  • "slowest": Very slow speech
  • "slow": Slower than normal speech
  • "normal": Default speech rate
  • "fast": Faster than normal speech
  • "fastest": Very fast speech

For more granular control, you can define speed as a float within the range [-1.0, 1.0]. A value of 0.0 represents the default speed, while negative values slow down the speech and positive values speed it up.

Example:

1"__experimental_controls": {
2 "speed": "fast"
3}

Example with float value:

1"__experimental_controls": {
2 "speed": -0.1
3}

Emotion Control

The emotion parameter is an array of emotion tags that modify the voice’s tone by adding emotion. Each tag follows the format "emotion_name:level".

Emotion Names:

  • anger
  • positivity
  • surprise
  • sadness
  • curiosity

Emotion Levels:

  • lowest: Slight addition of the emotion
  • low: Mild addition of the emotion
  • (omit level for moderate addition of the emotion)
  • high: Strong addition of the emotion
  • highest: Maximum addition of the emotion

Example:

1"__experimental_controls": {
2 "emotion": [
3 "curiosity",
4 "positivity:high"
5 ]
6}

In this example, the voice will express:

  • High positivity
  • Medium curiosity (level omitted)
  • Low surprise

Combining Speed and Emotion

You can combine both speed and emotion controls:

1"__experimental_controls": {
2 "speed": "fast",
3 "emotion": [
4 "positivity:high",
5 "curiosity"
6 ]
7}

This configuration will result in fast speech with high positivity and medium curiosity.

Notes

  • All emotion tag levels add the specified emotion to the voice. They don’t reduce or remove emotions, but rather add them in varying intensities.
  • The effects of these controls may vary depending on the chosen voice and the content being spoken.
  • Experiment with different combinations to achieve the desired output for your use case.
  • If the voice is unstable, reducing the magnitude of the added modifications and reducing the number of different modifications added should help.
  • As this is an experimental feature, be prepared for potential changes or updates in future API versions.