Infill (Bytes) | Cartesia

Generate audio that smoothly connects two existing audio segments. This is useful for inserting new speech between existing speech segments while maintaining natural transitions.

The cost is 1 credit per character of the infill text plus a fixed cost of 300 credits.

Infilling is only available on sonic-2 at this time.

At least one of left_audio or right_audio must be provided.

As with all generative models, there’s some inherent variability, but here’s some tips we recommend to get the best results from infill:

Use longer infill transcripts
- This gives the model more flexibility to adapt to the rest of the audio
Target natural pauses in the audio when deciding where to clip
- This means you don’t need word-level timestamps to be as precise
Clip right up to the start and end of the audio segment you want infilled, keeping as much silence in the left/right audio segments as possible
- This helps the model generate more natural transitions

Request

This endpoint expects a multipart form with multiple files.

left_audiofileRequired

right_audiofileRequired

model_idstringRequired

The ID of the model to use for generating audio

languagestringRequired

The language of the transcript

transcriptstringRequired

The infill text to generate

voice_idstringRequired

The ID of the voice to use for generating audio

output_format[container]enumRequired

The format of the output audio

Allowed values:

output_format[sample_rate]integerRequired

The sample rate of the output audio

output_format[encoding]enumOptional

Required for raw and wav containers.

Allowed values:

output_format[bit_rate]integerOptional

Required for mp3 containers.

Response

This endpoint returns a file.

1	import requests
2
3	url = "https://api.cartesia.ai/infill/bytes"
4
5	files = {
6	"left_audio": "open('<file1>', 'rb')",
7	"right_audio": "open('<file1>', 'rb')"
8	}
9	payload = {
10	"model_id": "sonic-2",
11	"language": "en",
12	"transcript": "middle segment",
13	"voice_id": "694f9389-aac1-45b6-b726-9d9369183238",
14	"output_format[container]": "mp3",
15	"output_format[sample_rate]": "44100",
16	"output_format[encoding]": ,
17	"output_format[bit_rate]": "128000"
18	}
19	headers = {
20	"Cartesia-Version": "2025-04-16",
21	"Authorization": "Bearer <token>"
22	}
23
24	response = requests.post(url, data=payload, files=files, headers=headers)
25
26	print(response.json())

Headers

Request

Response