Agents - Cartesia Docs

Agents process input events and yield output events to control the conversation.

What is an Agent?

An Agent controls the input/output event loop. The process method receives events (user speech, call start, etc.) and yields responses. An Agent can be:

A class with a process method
A function with the same signature (env, event) -> AsyncIterable[OutputEvent]

from line.events import CallStarted, UserTurnEnded, AgentSendText

class HelloAgent:
    async def process(self, env, event):
        if isinstance(event, CallStarted):
            yield AgentSendText(text="Hello!")
        elif isinstance(event, UserTurnEnded):
            yield AgentSendText(text="I heard you!")

How an Agent works:

Events arrive (user speaks, call starts, button pressed)
SDK calls agent.process(env, event)
Agent yields output events (speech, tool calls, handoffs)
SDK handles audio, LLM calls, and state management

LlmAgent

Use the built-in LlmAgent which wraps 100+ LLM providers via LiteLLM:

from line.llm_agent import LlmAgent, LlmConfig

agent = LlmAgent(
    model="anthropic/claude-haiku-4-5-20251001",  # Or "gpt-5-nano", "gemini/gemini-2.5-flash-preview-09-2025", etc.
    api_key="your-api-key",
    tools=[...],  # Optional list of tools
    config=LlmConfig(
        system_prompt="You are a helpful assistant...",
        introduction="Hello! How can I help you today?",
    ),
)

Prompting

system_prompt to define your agent’s personality and introduction for the greeting:

import os
from line import CallRequest
from line.llm_agent import LlmAgent, LlmConfig, end_call
from line.voice_agent_app import AgentEnv, VoiceAgentApp

SYSTEM_PROMPT = """You are a friendly customer service agent.

Rules:
- Be polite and empathetic
- Confirm understanding before taking action
-  end_call to gracefully end conversations
"""

async def get_agent(env: AgentEnv, call_request: CallRequest):
    return LlmAgent(
        model="anthropic/claude-haiku-4-5-20251001",
        api_key=os.getenv("ANTHROPIC_API_KEY"),
        tools=[end_call],
        config=LlmConfig(
            system_prompt=SYSTEM_PROMPT,
            introduction="Hello! How can I help you today?",
        ),
    )

app = VoiceAgentApp(get_agent=get_agent)

if __name__ == "__main__":
    app.run()

Supported Models

Provider	Model Examples
Anthropic	`anthropic/claude-haiku-4-5-20251001`, `anthropic/claude-sonnet-4-5`
OpenAI	`gpt-5-nano`, `gpt-5.2`
Google	`gemini/gemini-2.5-flash-preview-09-2025`, `gemini/gemini-3.0-preview`
And 100+ more via LiteLLM

LlmConfig Options

Option	Type	Description
`system_prompt`	`str`	The system prompt defining agent behavior
`introduction`	`Optional[str]`	Message sent on call start. `None` or `""` to wait for r
`temperature`	`Optional[float]`	Sampling temperature
`max_tokens`	`Optional[int]`	Maximum tokens per response
`top_p`	`Optional[float]`	Nucleus sampling threshold
`stop`	`Optional[List[str]]`	Stop sequences
`seed`	`Optional[int]`	Random seed for reproducibility
`presence_penalty`	`Optional[float]`	Presence penalty for token generation
`frequency_penalty`	`Optional[float]`	Frequency penalty for token generation
`num_retries`	`int`	Number of retries on failure (default: 2)
`fallbacks`	`Optional[List[str]]`	Fallback models if primary fails
`timeout`	`Optional[float]`	Request timeout in seconds
`extra`	`Dict[str, Any]`	Provider-specific options passed through to LiteLLM

History Management

LlmAgent exposes a history attribute for structured control over the conversation history the LLM sees. Adding entries:

# Append a user note (role="user" is the default)
agent.history.add_entry("The user prefers formal language.")

# Insert before a specific event
agent.history.add_entry("Context about the caller.", before=some_event)

Replacing history segments:

# Replace the entire history
agent.history.update(new_events)

# Replace everything from `start` onward
agent.history.update(new_events, start=some_event)

# Replace a specific segment
agent.history.update(new_events, start=start_event, end=end_event)

Per-Turn Overrides

process() accepts keyword arguments that apply to just that turn without mutating the agent:

# Higher temperature for just this turn
await agent.process(env, event, config=LlmConfig(temperature=0.9))

# Swap a specific tool for one turn
await agent.process(env, event, tools=[custom_lookup_tool])

# Inject ephemeral context
await agent.process(env, event, context="The user is a VIP customer.")

# Completely override history for one turn
await agent.process(env, event, history=custom_history_list)

Only explicitly set LlmConfig fields take effect — unset fields fall through to the agent’s stored config. To change tools permanently (e.g., enabling end_call after a certain point), modify agent.tools directly instead of using per-turn overrides.

Controlling the Conversational Loop

Use event filters to control when your agent’s process method runs, and which events can interrupt it.

Default Behavior

# Agent processes these events:
run_filter = [CallStarted, UserTurnEnded, CallEnded]

# These events interrupt the agent:
cancel_filter = [UserTurnStarted]

This means: agent greets on call start, responds when user finishes speaking, and can be interrupted.

Customizing Filters

Return a tuple from get_agent to override defaults:

from line.events import CallStarted, UserTurnEnded, UserTurnStarted, CallEnded

async def get_agent(env, call_request):
    agent = LlmAgent(...)
    
    # Customize behavior
    run_filter = [CallStarted, UserTurnEnded, CallEnded]
    cancel_filter = [UserTurnStarted]
    
    return (agent, run_filter, cancel_filter)

Common Customizations

More responsive (process partial transcriptions):

from line.events import CallStarted, UserTurnEnded, UserTextSent, CallEnded

run_filter = [CallStarted, UserTurnEnded, UserTextSent, CallEnded]
cancel_filter = [UserTurnStarted]

This makes your agent start processing before the user finishes speaking, creating a more responsive experience. Uninterruptible turns: If you want a single message to complete without being interrupted by the user, mark the output as interruptible=False when sending it with AgentSendText.

from line.events import AgentSendText

yield AgentSendText(
    text="Before we continue, I need to share a quick disclaimer.",
    interruptible=False,
)

Custom logic with functions:

def business_hours_only(event):
    hour = datetime.now().hour
    if isinstance(event, (CallStarted, CallEnded)):
        return True
    return isinstance(event, UserTurnEnded) and 9 <= hour < 17

return (agent, business_hours_only, [UserTurnStarted])

For advanced patterns like guardrails, routing, and agent wrappers, see Advanced Patterns.

Handling Incoming Calls

When a call arrives, you can inspect caller information and configure how your agent responds before it starts.

A call arrives from a web client or telephony provider
Your pre_call_handler receives a CallRequest with caller details
You return configuration (voice, language) or reject the call
Your get_agent function creates an agent using the enriched request

Parsing the CallRequest

Contains information about the incoming call:

Field	Type	Description
`call_id`	`str`	Unique identifier for the call
`from_`	`str`	Caller identifier (phone number or client ID)
`to`	`str`	Called number or agent ID
`agent_call_id`	`str`	Agent call ID for logging/correlation
`metadata`	`Optional[dict]`	Custom data passed from your client application
`agent`	`AgentConfig`	Prompts configured in Playground or via API

The agent field contains an AgentConfig with:

Field	Type	Description
`system_prompt`	`Optional[str]`	System prompt configured in Playground or via the Calls API
`introduction`	`Optional[str]`	Introduction message configured in Playground or via the Calls API

Returning a PreCallResult

Use pre_call_handler to set voice, language, or reject calls before your agent starts:

from line.voice_agent_app import CallRequest, PreCallResult, VoiceAgentApp

async def pre_call_handler(call_request: CallRequest):
    return PreCallResult(
        metadata={"tier": "premium"},  # Merged into call_request.metadata
        config={
            "tts": {
                "voice": "a0e99841-438c-4a64-b679-ae501e7d6091",
                "model": "sonic-3",
                "language": "en",
            }
        }
    )

app = VoiceAgentApp(get_agent=get_agent, pre_call_handler=pre_call_handler)

Your client application can pass metadata (user ID, language preference, account tier) in the call request. Your pre_call_handler reads this and configures TTS/STT accordingly.

Configuration Options

TTS Options:

Option	Type	Description
`voice`	string	Voice identifier (UUID)
`model`	string	TTS model (`sonic-3`, `sonic-turbo`)
`language`	string	Language code (`en`, `es`, `hi`, etc.)
`pronunciation_dict_id`	string	Custom pronunciation dictionary ID

STT Options:

Option	Type	Description
`language`	string	Language code for speech recognition

Rejecting Calls

Return None to reject a call with a 403 status:

async def pre_call_handler(call_request: CallRequest):
    if is_blocked(call_request.from_):
        return None  # Rejects with 403
    return PreCallResult()

Custom Pronunciations

Use a pronunciation dictionary to control how specific words are spoken:

async def pre_call_handler(call_request: CallRequest):
    return PreCallResult(
        config={
            "tts": {
                "voice": "a0e99841-438c-4a64-b679-ae501e7d6091",
                "model": "sonic-3",
                "pronunciation_dict_id": "your-dict-id",
            }
        }
    )

Accessing call metadata in your Agent logic

The CallRequest is available in get_agent:

async def get_agent(env, call_request):
    # Log call information
    logger.info(f"Call {call_request.call_id} from {call_request.from_}")

    # Access metadata passed from your application (or added in pre_call_handler)
    customer_id = call_request.metadata.get("customer_id") if call_request.metadata else None
    customer_name = call_request.metadata.get("customer_name") if call_request.metadata else None

    # Build a personalized system prompt using metadata
    base_prompt = call_request.agent.system_prompt or "You are a helpful customer service agent."

    if customer_id:
        base_prompt += f"\n\nCurrent customer ID: {customer_id}"
    if customer_name:
        base_prompt += f"\nCustomer name: {customer_name}"

    return LlmAgent(
        model="gpt-5-nano",
        api_key=os.getenv("OPENAI_API_KEY"),
        config=LlmConfig(
            system_prompt=base_prompt,
            introduction=call_request.agent.introduction,
        ),
    )

LlmConfig.from_call_request() handles the priority chain automatically:

CallRequest.agent.system_prompt value (if set)
Your fallback value (if provided)
SDK default

async def get_agent(env, call_request):
    return LlmAgent(
        model="anthropic/claude-haiku-4-5-20251001",
        api_key=os.getenv("ANTHROPIC_API_KEY"),
        tools=[end_call],
        config=LlmConfig.from_call_request(
            call_request,
            fallback_system_prompt="You are a sales assistant.",
            fallback_introduction="Hi! How can I help with your purchase?",
            temperature=0.7,  # Additional LlmConfig options
        ),
    )

Using CallRequest lets you iterate on system prompts from the Playground instantly, while code handles the technical configuration and fallback defaults.

Letting The User Speak First

Set introduction to an empty string to wait for the user to speak first:

config=LlmConfig.from_call_request(
    call_request,
    fallback_system_prompt=SYSTEM_PROMPT,
    fallback_introduction="",
)

Custom Agent Function

For advanced use cases, you can build agents from scratch as functions:

from line.events import UserTurnEnded, AgentSendText, CallStarted

async def my_agent(env, event):
    if isinstance(event, CallStarted):
        yield AgentSendText(text="Hello! How can I help?")
    elif isinstance(event, UserTurnEnded):
        user_text = event.content[0].content if event.content else ""
        yield AgentSendText(text=f"You said: {user_text}")

Custom Agent Class

Or as classes with state:

class GreetingAgent:
    def __init__(self, greeting: str):
        self.greeting = greeting
        self.greeted = False

    async def process(self, env, event):
        if isinstance(event, CallStarted) and not self.greeted:
            yield AgentSendText(text=self.greeting)
            self.greeted = True

Most developers can use LlmAgent with tools rather than building custom agents from scratch! Custom agents are powerful when you need full control over the event processing logic without LLM reasoning.

​What is an Agent?

​LlmAgent

​Prompting

​Supported Models

​LlmConfig Options

​History Management

​Per-Turn Overrides

​Controlling the Conversational Loop

​Default Behavior

​Customizing Filters

​Common Customizations

​Handling Incoming Calls

​Parsing the CallRequest

​Returning a PreCallResult

​Configuration Options

​Rejecting Calls

​Custom Pronunciations

​Accessing call metadata in your Agent logic

​Letting The User Speak First

​Custom Agent Function

​Custom Agent Class

What is an Agent?

LlmAgent

Prompting

Supported Models

LlmConfig Options

History Management

Per-Turn Overrides

Controlling the Conversational Loop

Default Behavior

Customizing Filters

Common Customizations

Handling Incoming Calls

Parsing the CallRequest

Returning a PreCallResult

Configuration Options

Rejecting Calls

Custom Pronunciations

Accessing call metadata in your Agent logic

Letting The User Speak First

Custom Agent Function

Custom Agent Class