Use this file to discover all available pages before exploring further.
We recommend using Sonic 3.5 for best
results, most languages, and naturalness. We continue to serve these older
models for compatibility.
Some models and snapshots are being discontinued on June 1, 2026 — see API Changes for details.
the latest stable snapshot of the model
to be discontinued June 1, 2026
All models have a base model name (e.g. sonic-2, sonic-turbo) and date-versioned model names
(e.g. sonic-2-2025-06-11).
We recommend using base model names for prototyping and development, then switching to a date-versioned model for production use cases to ensure stability.
Sonic 3 is a streaming TTS model with high naturalness, accurate transcript following, and industry-leading latency. Sonic 3 supports fine-grained control on volume, speed, and emotion via API parameters and SSML tags — useful when those controls matter for your use case (these are temporarily disabled on sonic-3.5).Key features:
Sonic-2 provides ultra-realistic speech with accurate transcript following, minimal hallucinations, and excellent voice cloning. It’s latency optimized and achieves 90ms model latency.Additional Capabilities:
The first version of our flagship text-to-speech model. It produces high-accuracy, expressive speech, and is optimized for efficiency to achieve low latency.