The voice to use for the speech. Can be either an ID or an embedding, specified by the mode
field.
Language of the generation. Options are: en
(English), de
(German), es
(Spanish), fr
(French), ja
(Japanese), pt
(Portuguese), zh
(Chinese), hi
(Hindi), it
(Italian), ko
(Korean), nl
(Dutch), pl
(Polish), ru
(Russian), sv
(Swedish), tr
(Turkish).
Whether to add timestamps to the audio. This is only supported on tts/sse
and WebSocket endpoints.