Migrating From OpenAI Speech to Text

This guide covers migrating from OpenAI Speech to Text, which is for batch audio transcription. Cartesia’s Batch Speech-to-Text API is compatible with OpenAI’s API, making migration simply a matter of changing a few parameters.

Back to migration guides

Migrating from OpenAI realtime transcription and others

Endpoints

Cartesia Native: /stt - Full feature support
OpenAI Compatible: /audio/transcriptions - Drop-in replacement for OpenAI

Using the OpenAI SDK

Replace your OpenAI base URL with https://api.cartesia.ai to use the compatibility layer for Cartesia:

Supported parameters

file - The audio file to transcribe
model - Use ink-whisper
language - Input audio language (ISO-639-1 format)
timestamp_granularities - Include ["word"] to get word-level timestamps

Response Format: Always returns JSON with transcribed text, duration, language, and optionally word timestamps. For the complete parameter reference, see the Batch STT API documentation. Point the OpenAI SDK at Cartesia’s base URL and switch the model to ink-whisper:

from openai import OpenAI

client = OpenAI(
    api_key="your-cartesia-api-key",
    base_url="https://api.cartesia.ai"
)

with open("audio.wav", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        file=audio_file,
        model="ink-whisper",
        language="en",
        timestamp_granularities=["word"]
    )
    
print(transcript.text)

import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI({
  apiKey: 'your-cartesia-api-key',
  baseURL: 'https://api.cartesia.ai'
});

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('audio.wav'),
  model: 'ink-whisper',
  language: 'en',
  timestamp_granularities: ['word']
});

console.log(transcription.text);

Direct API Usage

Both endpoints accept identical parameters and return the same JSON response format:

Cartesia Native Endpoint

curl -X POST https://api.cartesia.ai/stt \
  -H "X-API-Key: your-cartesia-api-key" \
  -F "file=@audio.wav" \
  -F "model=ink-whisper" \
  -F "language=en" \
  -F "timestamp_granularities[]=word"

OpenAI-Compatible Endpoint

curl -X POST https://api.cartesia.ai/audio/transcriptions \
  -H "X-API-Key: your-cartesia-api-key" \
  -F "file=@audio.wav" \
  -F "model=ink-whisper" \
  -F "language=en" \
  -F "timestamp_granularities[]=word"

Summary

To migrate from OpenAI Speech to Text to Cartesia:

Update the base URL: Change from https://api.openai.com/v1 to https://api.cartesia.ai
Update authentication: Replace your OpenAI API key with your Cartesia API key
Update model names: Use ink-whisper instead of OpenAI’s model names
Keep the same endpoint: Continue using /audio/transcriptions
Avoid unsupported parameters: Remove prompt, temperature, and response_format parameters
Use timestamp_granularities (Optional): Add timestamp_granularities: ["word"] to get word-level timestamps

The core functionality remains the same, with JSON responses containing transcribed text and optional word timestamps.

Get Started

Text-to-Speech

Speech-to-Text

Tools

Integrations

Enterprise

Migrating From OpenAI Speech to Text

Back to migration guides

Endpoints

Using the OpenAI SDK

Supported parameters

Direct API Usage

Cartesia Native Endpoint

OpenAI-Compatible Endpoint

Summary

Back to migration guides

​Endpoints

​Using the OpenAI SDK

​Supported parameters

​Direct API Usage

​Cartesia Native Endpoint

​OpenAI-Compatible Endpoint

​Summary

Endpoints

Using the OpenAI SDK

Supported parameters

Direct API Usage

Cartesia Native Endpoint

OpenAI-Compatible Endpoint

Summary