Infill (Bytes)
Generate audio that smoothly connects two existing audio segments
Authorizations
Headers
API version header.
2024-11-13 "2024-11-13"
Body
Audio clip that comes before the infill transcript:
left_audio -> transcript -> right_audio
For best results, target natural pauses in the audio and clip tightly.
At least one of left_audio or right_audio must be provided.
Supported audio formats: flac, mp3, mpeg, mpga, oga, ogg, wav, webm
Audio clip that comes after the infill transcript:
left_audio -> transcript -> right_audio
For best results, target natural pauses in the audio and clip tightly.
At least one of left_audio or right_audio must be provided.
Supported audio formats: flac, mp3, mpeg, mpga, oga, ogg, wav, webm
The ID of the model to use for generating audio
sonic-3, sonic-3-2026-01-12, sonic-3-2025-10-27 The language of the transcript
en, fr, de, es, pt, zh, ja, hi, it, ko, nl, pl, ru, sv, tr The infill text to generate. For best results, use longer transcripts to give the model more flexibility to adapt to the rest of the audio.
The ID of the voice to use for generating audio
The format of the output audio
raw, wav, mp3 The sample rate of the output audio in Hz. Supported sample rates are 8000, 16000, 22050, 24000, 44100, 48000.
Required for raw and wav containers.
pcm_f32le, pcm_s16le, pcm_mulaw, pcm_alaw Required for mp3 containers.
Either a number between -1.0 and 1.0 or a natural language description of speed.
If you specify a number, 0.0 is the default speed, -1.0 is the slowest speed, and 1.0 is the fastest speed.
An array of emotion:level tags.
Supported emotions are: anger, positivity, surprise, sadness, and curiosity.
Supported levels are: lowest, low, (omit), high, highest.
An array of emotion:level tags.
Supported emotions are: anger, positivity, surprise, sadness, and curiosity.
Supported levels are: lowest, low, (omit), high, highest.
anger:lowest, anger:low, anger, anger:high, anger:highest, positivity:lowest, positivity:low, positivity, positivity:high, positivity:highest, surprise:lowest, surprise:low, surprise, surprise:high, surprise:highest, sadness:lowest, sadness:low, sadness, sadness:high, sadness:highest, curiosity:lowest, curiosity:low, curiosity, curiosity:high, curiosity:highest Response
Audio bytes
The response is of type file.