Introduction

When to use

Calibrated LLM judges are AI evaluators that watch your traces, sessions, or spans and score outputs according to criteria you define. They get better over time the more you refine and correct their evaluations.

When to use

Use a judge when you want consistent, scalable evaluation of:

Hallucinations, safety/policy violations
Response quality (helpfulness, tone, structure)
Latency, cost, and error patterns tied to specific criteria

Reference Setup

⌘I

Tracing

Prompts

Judges

When to use

Tracing

Prompts

Judges

​When to use

When to use