Generate audio from a transcript using a given voice and model. The audio is streamed out as raw bytes.
The version of the Cartesia API to use.
A transcript for the generation. Should not be empty and should not be only puncutation.
The voice to use for the speech. Can be either an ID or an embedding, specified by the mode
field.
The maximum duration of the audio in seconds.
Language of the generation. Options are: en
(English), de
(German), es
(Spanish), fr
(French), ja
(Japanese), pt
(Portuguese), zh
(Chinese), hi
(Hindi), it
(Italian), ko
(Korean), nl
(Dutch), pl
(Polish), ru
(Russian), sv
(Swedish), tr
(Turkish).