The Line platform includes a suite of tools for evaluating how your Agent is performing, both during development phase and in production.
You have full control over how metrics for evaluating your agent are defined.
System Metrics
By default, all calls made by a Line Agent have a set of system metrics automatically calculated to help evaluate performance.
| System Metric | Description |
|---|
| system_call_success | A boolean status indicating if the call disconnects unexpectedly, for example due to reasoning code crashing |
| system_text_to_speech_ttfb | The time to first byte of audio generated by the TTS model on the first turn of the conversation |
LLM as a Judge
An LLM-as-a-Judge metric is created in the playground by setting a name and specifying a prompt. You can try out different prompts in
the playground against existing call transcripts by copying a call id into the metric creation field and clicking evaluate
to generate a sample output.
Write your LLM as a Judge metrics to return a single value and description
field.
A metric name can only include lower case letters, digits, and ‘-’, ‘_’, or ‘.’ characters so that you can manage it
from a cli. Metric names must also be unique within your organization.
Assigning Metrics
Once a metric is created, it can be assigned to an Agent via the playground from the Agent page. All subsequent calls made
to or from that Agent will have metric results calculated and available to view in the console and API. Note
that when you assign a metric to an existing Agent, it won’t be automatically run on previous calls.