Speech-to-Text (Streaming)

Messages

{
  "type": "transcript",
  "is_final": true,
  "request_id": "b67e1c5d-2f4c-4c3d-9f82-96eb4d2f12a8",
  "text": "How are you doing today?",
  "duration": 2.5,
  "language": "en",
  "words": [
    {
      "word": "How",
      "start": 0,
      "end": 0.12
    },
    {
      "word": "are",
      "start": 0.15,
      "end": 0.25
    },
    {
      "word": "you",
      "start": 0.28,
      "end": 0.35
    },
    {
      "word": "doing",
      "start": 0.38,
      "end": 0.55
    },
    {
      "word": "today?",
      "start": 0.58,
      "end": 0.78
    }
  ]
}

WSS

stt

websocket

Messages

{
  "type": "transcript",
  "is_final": true,
  "request_id": "b67e1c5d-2f4c-4c3d-9f82-96eb4d2f12a8",
  "text": "How are you doing today?",
  "duration": 2.5,
  "language": "en",
  "words": [
    {
      "word": "How",
      "start": 0,
      "end": 0.12
    },
    {
      "word": "are",
      "start": 0.15,
      "end": 0.25
    },
    {
      "word": "you",
      "start": 0.28,
      "end": 0.35
    },
    {
      "word": "doing",
      "start": 0.38,
      "end": 0.55
    },
    {
      "word": "today?",
      "start": 0.58,
      "end": 0.78
    }
  ]
}

X-API-Key

type:httpApiKey

Use an API key when you're calling from a trusted server.

access_token

type:httpApiKey

Use a short-lived access token when calling from a browser or client app. Learn more here.

query

type:object

Send Audio Data

type:string

Send WebSocket binary messages containing raw audio data as specified by the encoding and sample_rate query parameters.

Audio Requirements:

Send audio in small chunks, e.g. 100 ms
Audio format must match the encoding and sample_rate parameters

Finalize Command

type:string

Send finalize as a text message when the user is done speaking to receive the transcript for any buffered audio.

Example: finalize

Close Command

type:string

Send close as a text message to flush remaining audio, close session, and receive a done acknowledgment

Example: close

Transcript Response

type:object

Transcript chunks.

You should send the finalize command after the user is done speaking to make the API emit these transcript chunks; although, the API may send transcript chunks even before you send the finalize command.

Flush Done Response

type:object

Acknowledgment for the finalize command

Done Response

type:object

Acknowledgment for the close command

Error Response

type:object

Error information for STT WebSocket connections.

Realtime Speech-to-Text (Auto)

Batch Speech-to-Text

⌘I