APIリクエストを実行する

前提条件

Cartesia アカウント。
API キー。
FFmpeg のインストール (任意ですが推奨)。

FFmpeg は Cartesia API の使用に必須ではありませんが、音声ファイルの保存、再生、変換に便利なので、以下の例で使用します。お使いのプラットフォームのパッケージマネージャーでインストールできます:

# macOS
brew install ffmpeg

# Debian/Ubuntu
sudo apt install ffmpeg

# Fedora
dnf install ffmpeg

# Arch Linux
sudo pacman -S ffmpeg

最初の発話を生成する

cURL
Python
JavaScript/TypeScript

最初の発話を生成するには、YOUR_API_KEY を置き換えてターミナルで次のコマンドを実行します:

curl -N -X POST "https://api.cartesia.ai/tts/bytes" \
        -H "Cartesia-Version: 2025-04-16" \
        -H "Authorization: Bearer YOUR_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{"transcript": "Welcome to Cartesia Sonic!", "model_id": "sonic-2", "voice": {"mode":"id", "id": "a0e99841-438c-4a64-b679-ae501e7d6091"}, "output_format":{"container":"wav", "encoding":"pcm_f32le", "sample_rate":44100}}' > sonic-2.wav

YOUR_API_KEY を必ず実際のAPIキーに置き換えてください。そうしないとコマンドは何も出力しません!

生成された sonic-2.wav ファイルは、afplay sonic-2.wav(macOSの場合)または ffplay sonic-2.wav(FFmpegがインストールされたシステム)で再生できます。ファイルエクスプローラーでダブルクリックしても再生できます。

bytes エンドポイントはさまざまな出力フォーマットをサポートしており、音声を事前に保存しておきたいバッチ用途に最適です。これに対して、Cartesia の WebSocket および Server-Sent Events エンドポイントは、トランスコーディングによるレイテンシのオーバーヘッドを避けるため、生の PCM 音声をストリーミング送出します。

SDK をインストールする

pip install cartesia

# Or, if you're using uv
uv add cartesia

API を呼び出す

import os
import subprocess
from cartesia import Cartesia

if os.environ.get("CARTESIA_API_KEY") is None:
    raise ValueError("CARTESIA_API_KEY is not set")

client = Cartesia(api_key=os.environ.get("CARTESIA_API_KEY"))

data = client.tts.bytes(
    model_id="sonic-2",
    transcript="Hello, world! I'm generating audio on Cartesia.",
    voice_id="a0e99841-438c-4a64-b679-ae501e7d6091",
    # You can find the supported `output_format`s at https://docs.cartesia.ai/api-reference/tts/bytes
    output_format={
        "container": "wav",
        "encoding": "pcm_f32le",
        "sample_rate": 44100,
    },
)

with open("sonic-2.wav", "wb") as f:
    f.write(data)

# Play the file
subprocess.run(["ffplay", "-autoexit", "-nodisp", "sonic-2.wav"])

スクリプトを実行する

env CARTESIA_API_KEY=YOUR_API_KEY python cartesia.py

# Or, if you're using uv
env CARTESIA_API_KEY=YOUR_API_KEY uv run cartesia.py

SDK をインストールする

# NPM
npm install @cartesia/cartesia-js

# Yarn
yarn add @cartesia/cartesia-js

# Bun
bun add @cartesia/cartesia-js

# PNPM
pnpm add @cartesia/cartesia-js

API を呼び出す

// hello.js
import { CartesiaClient } from "@cartesia/cartesia-js";
import fs from "node:fs";
import { spawn } from "node:child_process";
import process from "node:process";

if (!process.env.CARTESIA_API_KEY) {
  throw new Error("CARTESIA_API_KEY is not set");
}

// Set up the client.
const client = new CartesiaClient({
  apiKey: process.env.CARTESIA_API_KEY,
});

// Make the API call.
const response = await client.tts.bytes({
  modelId: "sonic-2",
  voice: {
    mode: "id",
    id: "a0e99841-438c-4a64-b679-ae501e7d6091",
  },
  outputFormat: {
    container: "wav",
    encoding: "pcm_f32le",
    sampleRate: 44100,
  },
  transcript: "Welcome to Cartesia Sonic!",
});

// Write `response` (of type ArrayBuffer) to a file.
fs.writeFileSync("sonic-2.wav", new Uint8Array(response));

// Play the file.
spawn("ffplay", ["-autoexit", "-nodisp", "sonic-2.wav"]);

スクリプトを実行する

env CARTESIA_API_KEY=YOUR_API_KEY node hello.js

Cartesia API クライアントは、Bun や Deno などの他のランタイムにも対応しています。

上記で使用したボイスはプレイグラウンドで確認できます。

Get Started

Text-to-Speech

Speech-to-Text

Tools

Integrations

Enterprise

APIリクエストを実行する

前提条件

最初の発話を生成する

​前提条件

​最初の発話を生成する

前提条件

最初の発話を生成する