{}"finalize""close"{
"type": "transcript",
"is_final": true,
"request_id": "b67e1c5d-2f4c-4c3d-9f82-96eb4d2f12a8",
"text": "How are you doing today?",
"duration": 2.5,
"language": "en",
"words": [
{
"word": "How",
"start": 0,
"end": 0.12
},
{
"word": "are",
"start": 0.15,
"end": 0.25
},
{
"word": "you",
"start": 0.28,
"end": 0.35
},
{
"word": "doing",
"start": 0.38,
"end": 0.55
},
{
"word": "today?",
"start": 0.58,
"end": 0.78
}
]
}{
"type": "flush_done",
"request_id": "b67e1c5d-2f4c-4c3d-9f82-96eb4d2f12a8"
}{
"type": "done",
"request_id": "b67e1c5d-2f4c-4c3d-9f82-96eb4d2f12a8"
}{
"type": "error",
"message": "Invalid model: The model is not valid, make sure it is a valid model ID.",
"code": 400,
"request_id": "b67e1c5d-2f4c-4c3d-9f82-96eb4d2f12a8"
}Speech-to-Text (Streaming)
Realtime speech transcription without turn detection
This endpoint relies on the finalize command to trigger transcription. See Compare STT Endpoints for details.
{}"finalize""close"{
"type": "transcript",
"is_final": true,
"request_id": "b67e1c5d-2f4c-4c3d-9f82-96eb4d2f12a8",
"text": "How are you doing today?",
"duration": 2.5,
"language": "en",
"words": [
{
"word": "How",
"start": 0,
"end": 0.12
},
{
"word": "are",
"start": 0.15,
"end": 0.25
},
{
"word": "you",
"start": 0.28,
"end": 0.35
},
{
"word": "doing",
"start": 0.38,
"end": 0.55
},
{
"word": "today?",
"start": 0.58,
"end": 0.78
}
]
}{
"type": "flush_done",
"request_id": "b67e1c5d-2f4c-4c3d-9f82-96eb4d2f12a8"
}{
"type": "done",
"request_id": "b67e1c5d-2f4c-4c3d-9f82-96eb4d2f12a8"
}{
"type": "error",
"message": "Invalid model: The model is not valid, make sure it is a valid model ID.",
"code": 400,
"request_id": "b67e1c5d-2f4c-4c3d-9f82-96eb4d2f12a8"
}API key passed in a header.
A short-lived access token passed in a query param to make API requests from a client. This is particularly useful in the browser, where WebSockets do not support headers. See Authenticate client apps to generate an access token.
Send WebSocket binary messages containing raw audio data as specified by the encoding and sample_rate query parameters.
Audio Requirements:
- Send audio in small chunks, e.g. 100 ms
- Audio format must match the
encodingandsample_rateparameters
Send finalize as a text message when the user is done speaking to receive the transcript for any buffered audio.
finalizeSend close as a text message to flush remaining audio, close session, and receive a done acknowledgment
closeTranscript chunks.
You should send the finalize command after the user is done speaking to make the API emit these transcript chunks;
although, the API may send transcript chunks even before you send the finalize command.
Acknowledgment for the finalize command
Acknowledgment for the close command
Error information for STT WebSocket connections.
Was this page helpful?