Skip to main content
This guide covers migrating from OpenAI Speech to Text, which is for batch audio transcription. Cartesia’s Batch Speech-to-Text API is compatible with OpenAI’s API, making migration simply a matter of changing a few parameters.

Back to guides

Migrating from OpenAI Realtime Transcription and Cartesia best practices

Endpoints

Cartesia Native: /stt - Full feature support
OpenAI Compatible: /audio/transcriptions - Drop-in replacement for OpenAI

Using the OpenAI SDK

Replace your OpenAI base URL with https://api.cartesia.ai to use the compatibility layer for Cartesia:

Supported parameters

  • file - The audio file to transcribe
  • model - Use ink-whisper
  • language - Input audio language (ISO-639-1 format)
  • timestamp_granularities - Include ["word"] to get word-level timestamps
Response Format: Always returns JSON with transcribed text, duration, language, and optionally word timestamps. For the complete parameter reference, see the Batch STT API documentation. Point the OpenAI SDK at Cartesia’s base URL and switch the model to ink-whisper:
from openai import OpenAI

client = OpenAI(
    api_key="your-cartesia-api-key",
    base_url="https://api.cartesia.ai"
)

with open("audio.wav", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        file=audio_file,
        model="ink-whisper",
        language="en",
        timestamp_granularities=["word"]
    )
    
print(transcript.text)

Direct API Usage

Both endpoints accept identical parameters and return the same JSON response format:

Cartesia Native Endpoint

curl -X POST https://api.cartesia.ai/stt \
  -H "X-API-Key: your-cartesia-api-key" \
  -F "file=@audio.wav" \
  -F "model=ink-whisper" \
  -F "language=en" \
  -F "timestamp_granularities[]=word"

OpenAI-Compatible Endpoint

curl -X POST https://api.cartesia.ai/audio/transcriptions \
  -H "X-API-Key: your-cartesia-api-key" \
  -F "file=@audio.wav" \
  -F "model=ink-whisper" \
  -F "language=en" \
  -F "timestamp_granularities[]=word"

Summary

To migrate from OpenAI Speech to Text to Cartesia:
  1. Update the base URL: Change from https://api.openai.com/v1 to https://api.cartesia.ai
  2. Update authentication: Replace your OpenAI API key with your Cartesia API key
  3. Update model names: Use ink-whisper instead of OpenAI’s model names
  4. Keep the same endpoint: Continue using /audio/transcriptions
  5. Avoid unsupported parameters: Remove prompt, temperature, and response_format parameters
  6. Use timestamp_granularities (Optional): Add timestamp_granularities: ["word"] to get word-level timestamps
The core functionality remains the same, with JSON responses containing transcribed text and optional word timestamps.