Skip to main content
Last verified: 2026-06-11

Overview

Use this integration to run an Anam interactive avatar with a Cartesia Sonic TTS voice. Anam handles the avatar session, WebRTC stream, and rendering, while a Cartesia voice provides the speech output.

Prerequisites

This flow does not require a Cartesia API key in your app. Anam uses the saved Cartesia voice attached to the Anam voiceId.

Installation

npm install express @anam-ai/js-sdk
Set your Node project to ESM so the backend and frontend snippets work as written (import syntax):
{
  "type": "module"
}

Quick start

Create an Anam session token with a Cartesia voice, then initialize the avatar in the browser with the returned session token.

1) Backend: create an Anam session token

import express from "express";

const app = express();
app.use(express.json());

const requiredEnvVars = ["ANAM_API_KEY"];

for (const name of requiredEnvVars) {
  if (!process.env[name]) {
    throw new Error(`${name} is required`);
  }
}

// Swap these for another avatar, Cartesia voice, or LLM from Anam Lab.
const avatarId = "071b0286-4cce-4808-bee2-e642f1062de3"; // Liv
const voiceId = "c48c4dd9-5050-11f1-9076-5e955d484d11"; // Siobhan - Warm Welcomer
const llmId = "a7cf662c-2ace-4de1-a21e-ef0fbf144bb7"; // GPT OSS 120B

app.post("/api/anam-session", async (req, res) => {
  const sessionTokenResponse = await fetch("https://api.anam.ai/v1/auth/session-token", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.ANAM_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      personaConfig: {
        name: "Cartesia voice avatar",
        avatarId,
        voiceId,
        llmId,
        systemPrompt: "You are a concise voice assistant. Keep responses brief.",
        voiceGenerationOptions: {
          speed: 1,
          volume: 1,
          emotion: "content",
        },
      },
    }),
  });

  if (!sessionTokenResponse.ok) {
    const errorBody = await sessionTokenResponse.text();
    return res.status(502).json({
      error: "Failed to create Anam session token",
      detail: errorBody,
    });
  }

  const session = await sessionTokenResponse.json();
  return res.json(session); // { sessionToken }
});

app.listen(3000);
The voiceId value is an Anam voice ID for a Cartesia voice, not a raw Cartesia provider voice ID. To find another one with the API, call the list voices endpoint and choose a voice where provider is CARTESIA. As of writing, Anam Lab lists 304 stock Cartesia voices.
curl "https://api.anam.ai/v1/voices?perPage=100&search=voice-name" \
  -H "Authorization: Bearer $ANAM_API_KEY"
You can also clone a Cartesia voice or import a Cartesia provider voice in Lab. When importing, set the provider voice ID and a default provider model such as sonic-3.5. Lab validates the voice before saving it as an Anam voice.

2) Frontend: initialize Anam with session token

Add a video element and a start button in your page HTML:
<video id="anam-video" autoplay playsinline></video>
<button id="start-avatar" type="button">Start avatar</button>
The sample below uses a browser module entry file. Wrap the call in an async function if your frontend is not a module.
import { createClient } from "@anam-ai/js-sdk";

let anamClient: ReturnType<typeof createClient> | undefined;

async function startAvatar() {
  const response = await fetch("/api/anam-session", { method: "POST" });

  if (!response.ok) {
    throw new Error("Failed to create Anam session");
  }

  const { sessionToken } = await response.json();
  anamClient = createClient(sessionToken);

  await anamClient.streamToVideoElement("anam-video");
}

document.querySelector<HTMLButtonElement>("#start-avatar")?.addEventListener("click", () => {
  startAvatar().catch((error) => {
    console.error(error);
  });
});
When the session starts, the video element shows Liv speaking with the Siobhan Cartesia voice. Try asking a short question in the browser.

Voice and performance options

Use voiceGenerationOptions on the session token request to set Cartesia voice generation controls for the session, such as speed, volume, and emotion. See Anam’s voice configuration docs for supported values.
For advanced performance control, Anam can preserve Cartesia-supported inline cues in the transcript while also using the same cue names for avatar Director Notes where enabled. For example, Cartesia documents speech control tags; Anam can use a matching laughter cue to shift the avatar toward a playful visual style.

Resources