Input Events
Input events are received by your agent from the Cartesia harness. All input events include an optionalhistory field containing the complete conversation history. When history is None, the event is being used within a history list; when history contains a list, the event has the full conversation context attached.
Call Lifecycle
| Event | Description |
|---|---|
CallStarted | The call has connected |
CallEnded | The call has ended |
User Turn Events
| Event | Description |
|---|---|
UserTurnStarted | The user started speaking (triggers interruption by default) |
UserTurnEnded | The user finished speaking (triggers new agent turn by default) |
UserTextSent | User text content (within UserTurnEnded.content) |
UserDtmfSent | User pressed a DTMF button |
Agent Turn Events (in history)
| Event | Description |
|---|---|
AgentTurnStarted | Agent started its turn |
AgentTurnEnded | Agent finished its turn |
AgentTextSent | Agent text that was spoken |
AgentDtmfSent | DTMF tone sent by agent |
Handoff Event
| Event | Description |
|---|---|
AgentHandedOff | Control transferred to a handoff tool |
Custom Event
| Event | Description |
|---|---|
UserCustomSent | Custom metadata sent from the client via the WebSocket custom event |
custom WebSocket event to the call stream. The event carries a metadata dict with whatever key-value pairs the client included:
Output Events
Output events are yielded by your agent to control the conversation.Speech
Call Control
Dynamic Configuration
Update call settings (voice, pronunciation, language) mid-conversation:| Field | Type | Description |
|---|---|---|
voice_id | Optional[str] | Updates the agent’s voice |
pronunciation_dict_id | Optional[str] | Updates the pronunciation dictionary |
language | Optional[str] | Updates the language used on speech-to-text (STT) and text-to-speech (STT) models |
Tool Events
These are emitted byLlmAgent to track tool execution:
Logging
Custom Events
Send arbitrary metadata from your agent to the harness:UserCustomSent for bidirectional metadata exchange.
Voice & Language Control
Change voice or speech recognition language mid-call:language field sets the ASR (speech recognition) language. Pass any language code supported by Ink STT, or "multilingual" for automatic language detection.
Event History
All input events include an optionalhistory field containing the conversation history. When history is None, the event is inside a history list; when it contains a list, full conversation context is attached. LlmAgent handles this automatically—you only need to understand history if building custom agents.
Accessing History
Event types in history
Event types in history
Events in the history list have
history=None to avoid redundant nesting. The event types are the same as regular input events:| Event Type | Description |
|---|---|
CallStarted | Call began |
UserTurnStarted | User started speaking |
UserTextSent | User’s transcribed speech |
UserDtmfSent | User’s DTMF button press |
UserTurnEnded | User finished speaking |
AgentTurnStarted | Agent started responding |
AgentTextSent | Agent’s spoken text |
AgentDtmfSent | Agent’s DTMF tone |
AgentTurnEnded | Agent finished responding |
CallEnded | Call ended |
How LlmAgent processes history
How LlmAgent processes history
LlmAgent automatically converts the event history to LLM messages:- User messages: From
UserTextSentevents - Assistant messages: From
AgentTextSentevents - Tool calls: From
AgentToolCalledandAgentToolReturnedevents
Custom agents: Using history
Custom agents: Using history
If building a custom agent (not using
LlmAgent), you can use history for context, summarization, or pattern detection: