Documentation Index
Fetch the complete documentation index at: https://docs.zeroeval.com/llms.txt
Use this file to discover all available pages before exploring further.
Retrieve judge evaluations programmatically for reporting, analysis, or integration into your own workflows.
Finding your IDs
Before making API calls, you’ll need these identifiers:
| ID | Where to find it |
|---|
| Project ID | Settings → Project, or in any URL after /projects/ |
| Judge ID | Click a judge in the dashboard; the ID is in the URL (/judges/{judge_id}) |
| Span ID | In trace details, or returned by your instrumentation code |
Python SDK
Get available criteria for a judge
Use this before submitting criterion-level feedback to discover valid criterion keys.
import zeroeval as ze
ze.init(api_key="YOUR_API_KEY")
criteria = ze.get_judge_criteria(
project_id="your-project-id",
judge_id="your-judge-id",
)
print(criteria["evaluation_type"])
for criterion in criteria["criteria"]:
print(criterion["key"], criterion.get("description"))
Get evaluations by judge
Fetch all evaluations for a specific judge with pagination and optional filters.
import zeroeval as ze
ze.init(api_key="YOUR_API_KEY")
response = ze.get_judge_evaluations(
project_id="your-project-id",
judge_id="your-judge-id",
limit=100,
offset=0,
)
print(f"Total: {response['total']}")
for eval in response["evaluations"]:
print(f"Span: {eval['span_id']}")
print(f"Result: {'PASS' if eval['evaluation_result'] else 'FAIL'}")
print(f"Score: {eval.get('score')}") # For scored judges
print(f"Reason: {eval['evaluation_reason']}")
Optional filters:
response = ze.get_judge_evaluations(
project_id="your-project-id",
judge_id="your-judge-id",
limit=100,
offset=0,
start_date="2025-01-01T00:00:00Z",
end_date="2025-01-31T23:59:59Z",
evaluation_result=True, # Only passing evaluations
feedback_state="with_user_feedback", # Only calibrated items
)
Get evaluations by span
Fetch all judge evaluations for a specific span (useful when a span has been evaluated by multiple judges).
response = ze.get_span_evaluations(
project_id="your-project-id",
span_id="your-span-id",
)
for eval in response["evaluations"]:
print(f"Judge: {eval['judge_name']}")
print(f"Result: {'PASS' if eval['evaluation_result'] else 'FAIL'}")
if eval.get('evaluation_type') == 'scored':
print(f"Score: {eval['score']} / {eval['score_max']}")
REST API
Use these endpoints directly with your API key in the Authorization header.
Get available criteria for a judge
curl -X GET "https://api.zeroeval.com/projects/{project_id}/judges/{judge_id}/criteria" \
-H "Authorization: Bearer $ZEROEVAL_API_KEY"
Get evaluations by judge
curl -X GET "https://api.zeroeval.com/projects/{project_id}/judges/{judge_id}/evaluations?limit=100&offset=0" \
-H "Authorization: Bearer $ZEROEVAL_API_KEY"
Query parameters:
| Parameter | Type | Description |
|---|
limit | int | Results per page (1-500, default 100) |
offset | int | Pagination offset (default 0) |
start_date | string | Filter by date (ISO 8601) |
end_date | string | Filter by date (ISO 8601) |
evaluation_result | bool | true for passing, false for failing |
feedback_state | string | with_user_feedback or without_user_feedback |
Get evaluations by span
curl -X GET "https://api.zeroeval.com/projects/{project_id}/spans/{span_id}/evaluations" \
-H "Authorization: Bearer $ZEROEVAL_API_KEY"
Judge evaluations response
{
"evaluations": [...],
"total": 142,
"limit": 100,
"offset": 0
}
Judge criteria response
{
"judge_id": "judge-uuid",
"evaluation_type": "scored",
"score_min": 0,
"score_max": 5,
"pass_threshold": 3.5,
"criteria": [
{
"key": "CTA_text",
"label": "CTA_text",
"description": "CTA clarity and visibility"
}
]
}
Span evaluations response
{
"span_id": "abc-123",
"evaluations": [...]
}
Evaluation object
| Field | Type | Description |
|---|
id | string | Unique evaluation ID |
span_id | string | The evaluated span |
evaluation_result | bool | Pass (true) or fail (false) |
evaluation_reason | string | Judge’s reasoning |
confidence_score | float | Model confidence (0-1) |
score | float | null | Numeric score (scored judges only) |
score_min | float | null | Minimum possible score |
score_max | float | null | Maximum possible score |
pass_threshold | float | null | Score required to pass |
model_used | string | LLM model that ran the evaluation |
created_at | string | ISO 8601 timestamp |
For large result sets, paginate through all evaluations:
all_evaluations = []
offset = 0
limit = 100
while True:
response = ze.get_judge_evaluations(
project_id="your-project-id",
judge_id="your-judge-id",
limit=limit,
offset=offset,
)
all_evaluations.extend(response["evaluations"])
if len(response["evaluations"]) < limit:
break
offset += limit
print(f"Fetched {len(all_evaluations)} total evaluations")