Skip to main content
Ink supports two realtime transcription modes:
  1. Client sends audio (Auto finalization)
  2. Client sends audio and signals when to finalize transcripts (Manual finalization)
Most speech-to-text APIs combine both behaviors, but Cartesia separates them for improved model performance. Auto finalization is recommended for most agents. At the same time, there are many use-cases where manual finalization is necessary. Some examples are:
  1. Push-to-talk apps
  2. Pipelines where you know speech is over and are waiting for the transcript

Guides