> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cartesia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# SSML Tags

<Warning>
  Tags for volume, speed, and emotions is in beta and subject to change in the
  future.
</Warning>

Sonic supports SSML-like (Speech Synthesis Markup Language) tags to control generated speech.

## Speed

*Available on `sonic-3` only. Temporarily disabled on `sonic-3.5` — will be re-enabled soon.*

<Warning>
  Note that if you're streaming token by token, you'll need to buffer the whole value of the speed or volume tags.
  Passing in `1`, `.`, `0` as separate inputs, for example, will result in reading out the tags.
</Warning>

You can guide the speed of a TTS generation with a `speed` tag, which takes a scalar between `0.6` and `1.5`.
This value is roughly a multiplier on the default speed. For example, `1.5` will generate audio at roughly 1.5x the
default speed.

```xml theme={null}
<speed ratio="1.5"/> I like to speak quickly because it makes me sound smart.
```

## Volume

*Available on `sonic-3` only. Temporarily disabled on `sonic-3.5` — will be re-enabled soon.*

You can guide the volume of a TTS generation with a `volume` tag, which is a number between `0.5`
and `2.0`. The default volume is `1`.

```xml theme={null}
<volume ratio="0.5"/> I will speak softly.
```

## Emotion <span class="beta-tag">Beta</span>

<Warning>
  Emotion control is highly experimental, particularly when emotion shifts occur
  mid-generation. If you need to change the emotion in a transcript, we recommend
  using separate generation contexts for each emotion. For best results, use [Voices
  tagged as "Emotive"](https://play.cartesia.ai/voices?tags=Emotive), as emotions may not work reliably with other Voices.
</Warning>

```xml theme={null}
<emotion value="angry"/> I will not allow you to continue this! <emotion value="sad"/> I was hoping for a peaceful resolution.
```

## Pauses and breaks

To insert breaks (or pauses) in generated speech, use a `break` tags with one attribute, `time`. For
example, `<break time="1s"/>`. You can specify the time in seconds (`s`) or milliseconds (`ms`).
For accounting purposes, these tags are considered 1 character and do not need to be separated with adjacent text using a
space -- to save credits you can remove spaces around break tags.

```xml theme={null}
Hello, my name is Sonic.<break time="1s"/>Nice to meet you.
```

## Spelling out numbers and letters

To spell out input text, you can wrap it in `<spell>` tags.

This is particularly useful for pronouncing long numbers or identifiers, such as credit card numbers, phone numbers, or unique IDs.

```xml theme={null}
My name is Bob, spelled <spell>Bob</spell>, my account number is <spell>ABC-123</spell>, my phone number is <spell>(123) 456-7890</spell>, and my credit card is <spell>1234-5678-9012-3456</spell>.
```

If you want to spell out numbers or identifiers and have planned breaks between the generations (e.g. taking a break between the area code of a phone number and the rest of that number), you can combine `<break>` and `<spell>` tags. These tags are considered 1 character and do not need to be separated with adjacent text using a space -- to save credits you can remove spaces around break and spell tags.

```xml theme={null}
My phone number is <spell>(123)</spell><break time="200ms"/><spell>4712177</spell> and my credit card number is <spell>1234</spell><break time="200ms"/><spell>5678</spell> <break time="200ms"/><spell>6347</spell><break time="200ms"/><spell>4537</spell>.
```
