December 2024
API
- Pricing updates; character usage columns migrated to bigint; presign URLs for Pro Voice Clone;
voices/<id>/conditioningendpoint; file to dataset in presign; userID-level endpoint restrictions; Stripe Customer ID on checkout. - EU deployment and Hindi HC fixes.
Playground
- New model on Playground highlighting transcript following improvements (demo, not GA).
- Blog and play.cartesia.ai live.
Models / Voices
- Model aliasing updated for
sonicandsonic-preview; twilight-morning in API and enterprise; conditioning entries for voice clone and multilingual. - Embedding search for LoRA voice selection.
Other
- Infrastructure and scaling updates.
- State of Voice blog and map.
November 2024
API
- Cartesia-Version 2024-11-13 — Upgrade to new API version; unified clone voice endpoint; datasets support; files endpoint pagination; FineTuneRequest status; fine-tunes API in Playground; presign URLs for Pro Voice Clone; Flush Done event for manual WebSocket flushing;
<pause>tag for continuations. - GCP Enterprise.
Playground
- Changes for new API; replay suite; GCP Enterprise.
Models / Voices
- Flush Done event for manual flushing in WebSocket;
<pause>tag for continuations within a single transcript; spelling fixes; manual flush and flush ID. - Empty encoding field allowed for mp3.
Docs
- API version 2024-11-13: Sonic 2, capability guides (clone, pronunciations, speed/emotion, continuations, localize), formatting for Sonic 2.
- Integrations: LiveKit, Pipecat, Rasa, Thoughtly, Twilio, MCP. Enterprise: SSO, organizations. See API Conventions.
October 2024
API
- Cartesia JS bytes endpoint; gen blocks removed from character counting; health checks and middleware; user-level queueing with queue length cap and timeout; 10× queue size rejection; Slang (continuations) and ConditioningData; voice changer JS SDK.
- Remove max limit from Playground.
Playground
- GCP: API and ingress for GCP US Central. Queueing: user-level queueing in API gateway; queue length cap and
queuedRequesttimeout. - Voice Changer: Playground UI polish; ConditioningData as part of ResolvedVoice; Slang rollout; flush on start/end of spell tags.
- LoRA release UI; onboarding data upsert fix; welcome page submit loading state; enterprise contact links.
Other
- Canonical linking and sitemap.
- Blog and navigation (Blog, Careers) updates.
September 2024
API
- User-level queueing; queue size and websocket queueing rejection;
api_statusfield for voice API usability; LoRA pricing and UX cleanup; flush all audio on DONE token (including CB); user option to obfuscate transcripts in logs. - LoRA and load balancer improvements.
Playground
- Function calling; agent creation, tests, and dev setup; voice agent infrastructure enabled.
- LoRA: HiFi cloning endpoint and Playground page; 8 new voices on Playground; Indian accent.
- Voice Changer Playground UI; JS SDK for voice changer. Language added to TTS request from
voices/[id]; flush all audio on DONE token; user option to obfuscate transcripts in logs.
Docs
- Blog and sitemap updates.
August 2024
API
- Reject invalid transcripts (docs and API gateway);
no_more_inputsin WebSockets can usevoice_embeddinginstead ofvoice_id. - Improved bad model id handling.
Playground
- Localization page in Playground and JS client; dialects and future-compatibility. Switch Playground to voice ID; allow both
idand embedding forTTSRequest; archive voices (kept accessible via API). - Replay button; feedback form; fix multilingual recommended voices when switching back to English; better error messaging.
Models / Voices
- LoRA support (multiple voices per LoRA, new cache key, easy-brook-lora, vc-flowing-dream).
Other
- On-device homepage launch; proper links for “Request a demo” button.
- LoRA: multiple voices per LoRA.
July 2024
API
- Voice Conversion endpoint — New API endpoint. Timestamps on WebSocket endpoint; per-generation voice controls (speed, emotion) in API; polar-tree deployed (
sonic-multilingual); continuous batching support; VocalWave (English) and long-generation support;sonic-english→ vocal-wave,sonic-multilingual→ ancient-voice aliasing. bufferandmp3params on/bytes; MP3 streaming and WAV encoding fixes; request cancellation; empty transcript allowed whencontinue=false; Stripe webhook cache clear; subscription cancellation/reactivation; Redis cache for overage; keys endpoints.- Clerk-based auth in API.
Playground
- Optional
enhanceflag for voice cloning in JS client, Python client, and Playground; voice update endpoint and docs; gate voice cloning for free users. - Prevent playing audio while playback in progress; download button disabled until generation finished; API key deletion clearer with copy button; character usage indicator; subscription and checkout fixes; gating clone form for free users.
Docs
- Voice cloning docs; timestamps and continuations; user guides for voice control and Twilio; emotion control and timestamps; “phonemes” terminology.
- Voice cloning from file.
Other
- Python client: continuations support, custom
base_url, fallback for websockets; JS client v1.0.1:onErrorprop on useTTS. - Voice controls (speed, emotion) in Python client and docs.
June 2024
API
- Continuations — Support for streaming input via SSE and Bytes;
NoMoreInputssignal. Cartesia Version enforced via header; Playground and checkout/subscription endpoints send it. - 48 kHz added to valid sample rates;
.wavbyte streaming; HTTP streaming endpoint for raw bytes; API standardization (backwards-compatible); new voices endpoints; mulaw and alaw backwards compatibility; Python client v1.0.0 (overhaul,output_format); JS client:pcm_s16le,pcm_alaw,pcm_mulawand improved typing; caching for voices;context_idin WebSocket response and docs. - Stripe webhooks for renewals and expiration; OpenAPI spec update.
Playground
- Multilingual:
languageparameter on voices API and in API; Playground language selection; multilingual copy on homepage; defaultsonic-english→ feasible-haze. - Mobile layout improvements; multilingual UI papercuts; voice cloning and empty transcript styling fixes; filtering moved from
voices/[id]to Speak page.
Models / Voices
sonic-multilingualandsonic-englishaliasing;languagecolumn on voices.- Recommended voices.
Docs
- Version 2024-06-10: get-started, API conventions, integrations (LiveKit, Pipecat, Rasa, Thoughtly, Twilio, MCP), clone voices, embeddings/voice mixing. See API Conventions.
Other
- ToS changes; revised pricing tiers; legal notices on sign-in and sign-up; overage toggle in Playground.
- Character usage limit blocks WebSocket when exceeded.
May 2024
API
- Cartesia Version header; HTTP streaming for raw bytes; new voices endpoints; mulaw/alaw backwards compatibility; API standardization (backwards-compatible); Python client v1.0.0; JS client structure overhaul.
- Clone voice upload fix.
Playground
- Redesign and Sonic launch copy; subscriptions page; favoriting voices; emotion and speed sliders; User vs Default voices; tags (Age, Accent) in DB and Playground;
sample_textfield (API Gateway and Playground); buffer streamed audio before playback; character usage indicator; API key auto-created on user creation; custom sign-in/sign-up and 404 on sign-out fix; disable generation button while audio playing; human-readable model names and skilled-cherry. - Character limit increase.
Models / Voices
- Human-readable model names; skilled-cherry; polar-tree (
sonic-multilingual); continuations and output format; Python client numpy array support. - Voice cloning disclaimer.
Docs
- Mintlify docs added.
Other
- Stripe webhooks for subscriptions; subscription cancellation and reactivation; character usage checks on generation routes; free subscription by default; Scale plan limit (8M chars/month); checkout and receipts.
- Custom sign-in/sign-up pages.
April 2024
API
model_idadded as parameter to generate; minimum transcript length enforced;voicemoved toAudioGenerationRequest; experimental router removed; speed controls and voice edit page; video generation endpoint.- WhisperX removed from dependencies.
March 2024
API
- WebSocket interrupt support; get voice embedding route; Redis cache for API keys; streaming switched from Octet to JSON; new model
genial-planet-1346;voiceparam required on requests; formatting support. - WhisperX for transcription (later removed).
Playground
- Voice cloning in the UI; connection info in JS client; audio downloadable; transcript length validation (max 400 chars, empty rejected); improved UX and crash handling when API key missing; welcome message and icons.
- API key creation on sign-up via Clerk webhooks.
Other
- Voice cloning and connection info in JS client.