Migrating From OpenAI Whisper to Cartesia Ink

Batch Speech-to-Text: This documentation covers OpenAI SDK compatibility for Cartesia Ink’s batched transcription endpoint.

For real-time transcription, use our Streaming STT endpoint.

Cartesia’s Batch Speech-to-Text API is compatible with OpenAI’s client libraries, enabling seamless migration from OpenAI Whisper.

Endpoints

Cartesia Native: /stt - Full feature support
OpenAI Compatible: /audio/transcriptions - Drop-in replacement for Whisper on the OpenAI SDK

Migration Guide for OpenAI SDK

Replace your OpenAI base URL with https://api.cartesia.ai to use the compatibility layer for Cartesia:

Parameter Support

Supported Parameters:

  • file - The audio file to transcribe
  • model - Use ink-whisper for Cartesia’s latest model
  • language - Input audio language (ISO-639-1 format)
  • timestamp_granularities - Include ["word"] to get word-level timestamps

Response Format: Always returns JSON with transcribed text, duration, language, and optionally word timestamps.

For the complete parameter reference, see our Batch STT API documentation.

Python Example

1from openai import OpenAI
2
3client = OpenAI(
4 api_key="your-cartesia-api-key",
5 base_url="https://api.cartesia.ai",
6 default_headers={"Cartesia-Version": "2025-04-16"}
7)
8
9with open("audio.wav", "rb") as audio_file:
10 transcript = client.audio.transcriptions.create(
11 file=audio_file,
12 model="ink-whisper",
13 language="en",
14 timestamp_granularities=["word"]
15 )
16
17print(transcript.text)

Node.js Example

1import OpenAI from 'openai';
2import fs from 'fs';
3
4const client = new OpenAI({
5 apiKey: 'your-cartesia-api-key',
6 baseURL: 'https://api.cartesia.ai',
7 defaultHeaders: {
8 'Cartesia-Version': '2025-04-16'
9 }
10});
11
12const transcription = await client.audio.transcriptions.create({
13 file: fs.createReadStream('audio.wav'),
14 model: 'ink-whisper',
15 language: 'en',
16 timestamp_granularities: ['word']
17});
18
19console.log(transcription.text);

Direct API Usage

Both endpoints accept identical parameters and return the same JSON response format:

Cartesia Native Endpoint

$curl -X POST https://api.cartesia.ai/stt \
> -H "X-API-Key: your-cartesia-api-key" \
> -H "Cartesia-Version: 2025-04-16" \
> -F "file=@audio.wav" \
> -F "model=ink-whisper" \
> -F "language=en" \
> -F "timestamp_granularities[]=word"

OpenAI-Compatible Endpoint

$curl -X POST https://api.cartesia.ai/audio/transcriptions \
> -H "X-API-Key: your-cartesia-api-key" \
> -H "Cartesia-Version: 2025-04-16" \
> -F "file=@audio.wav" \
> -F "model=ink-whisper" \
> -F "language=en" \
> -F "timestamp_granularities[]=word"

Migration from OpenAI

To migrate from OpenAI’s Whisper API to Cartesia:

  1. Update the base URL: Change from https://api.openai.com/v1 to https://api.cartesia.ai
  2. Update authentication: Replace your OpenAI API key with your Cartesia API key
  3. Add version header: Include Cartesia-Version: 2025-04-16 in requests
  4. Update model names: Use ink-whisper instead of OpenAI’s model names
  5. Keep the same endpoint: Continue using /audio/transcriptions
  6. Avoid unsupported parameters: Remove prompt, temperature, and response_format parameters
  7. Use timestamp_granularities (Optional): Add timestamp_granularities: ["word"] to get word-level timestamps

The core functionality remains the same, with JSON responses containing transcribed text and optional word timestamps.