LogoLogo
HomepageDiscordPlayground
ModelsAgentsAPI Reference
ModelsAgentsAPI Reference
  • Use the API
    • API Conventions
    • Concurrency Limits and Timeouts
  • API Status
    • GETAPI Status and Version
  • TTS
    • Compare TTS Endpoints
    • POSTText to Speech (Bytes)
    • STREAMText to Speech (SSE)
    • WSSText to Speech (WebSocket)
  • Stt
    • POSTSpeech-to-Text (Batch)
    • WSSSpeech-to-Text (Streaming)
    • Migrate from OpenAI
  • Agents
  • Voices
    • GETList Voices
    • POSTClone Voice
    • DELDelete Voice
    • PATCHUpdate Voice
    • GETGet Voice
    • POSTLocalize Voice
  • Voice Changer
    • POSTVoice Changer (Bytes)
    • STREAMVoice Changer (SSE)
  • Auth
    • POSTGenerate a New Access Token
  • Datasets
    • GETList
    • POSTCreate
    • GETGet
    • PATCHUpdate
    • DELDelete
    • GETList Files
    • POSTUpload File
    • DELDelete File
  • Fine Tunes
    • GETList
    • POSTCreate
    • GETGet
    • DELDelete
    • GETList Voices
  • Infill
    • POSTInfill (Bytes)
  • Pronunciation Dicts
    • GETList
    • POSTCreate
    • GETGet
    • PATCHUpdate
    • DELDelete
    • POSTPin
    • POSTUnpin
HomepageDiscordPlayground
Voice Changer

POST
https://api.cartesia.ai/voice-changer/bytes
POST
/voice-changer/bytes
1curl -X POST https://api.cartesia.ai/voice-changer/bytes \
2 -H "Cartesia-Version: 2025-04-16" \
3 -H "Authorization: Bearer <token>" \
4 -H "Content-Type: multipart/form-data" \
5 -F clip=@<file1> \
6 -F voice[id]="694f9389-aac1-45b6-b726-9d9369183238" \
7 -F output_format[container]="mp3" \
8 -F output_format[sample_rate]='44100' \
9 -F output_format[bit_rate]='128000'
Try it
Takes an audio file of speech, and returns an audio file of speech spoken with the same intonation, but with a different voice. This endpoint is priced at 15 characters per second of input audio.

Headers

AuthorizationstringRequired

Bearer authentication of the form Bearer <token>, where token is your auth token.

Cartesia-Version"2025-04-16"RequiredDefaults to 2025-04-16

Request

This endpoint expects a multipart form containing a file.
clipfileRequired
voice[id]stringRequired
output_format[container]enumRequired
Allowed values:
output_format[sample_rate]integerRequired
output_format[encoding]enumOptional

Required for raw and wav containers.

Allowed values:
output_format[bit_rate]integerOptional

Required for mp3 containers.

Response

This endpoint returns a file.
Was this page helpful?
Previous

Voice Changer (SSE)

Next
Built with

Voice Changer (Bytes)

Voice Changer (SSE)