Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.zeroeval.com/llms.txt

Use this file to discover all available pages before exploring further.

Retrieve judge evaluations programmatically for reporting, analysis, or integration into your own workflows.

Finding your IDs

Before making API calls, you’ll need these identifiers:
IDWhere to find it
Project IDSettings → Project, or in any URL after /projects/
Judge IDClick a judge in the dashboard; the ID is in the URL (/judges/{judge_id})
Span IDIn trace details, or returned by your instrumentation code

Python SDK

Get available criteria for a judge

Use this before submitting criterion-level feedback to discover valid criterion keys.
import zeroeval as ze

ze.init(api_key="YOUR_API_KEY")

criteria = ze.get_judge_criteria(
    project_id="your-project-id",
    judge_id="your-judge-id",
)

print(criteria["evaluation_type"])
for criterion in criteria["criteria"]:
    print(criterion["key"], criterion.get("description"))

Get evaluations by judge

Fetch all evaluations for a specific judge with pagination and optional filters.
import zeroeval as ze

ze.init(api_key="YOUR_API_KEY")

response = ze.get_judge_evaluations(
    project_id="your-project-id",
    judge_id="your-judge-id",
    limit=100,
    offset=0,
)

print(f"Total: {response['total']}")
for eval in response["evaluations"]:
    print(f"Span: {eval['span_id']}")
    print(f"Result: {'PASS' if eval['evaluation_result'] else 'FAIL'}")
    print(f"Score: {eval.get('score')}")  # For scored judges
    print(f"Reason: {eval['evaluation_reason']}")
Optional filters:
response = ze.get_judge_evaluations(
    project_id="your-project-id",
    judge_id="your-judge-id",
    limit=100,
    offset=0,
    start_date="2025-01-01T00:00:00Z",
    end_date="2025-01-31T23:59:59Z",
    evaluation_result=True,  # Only passing evaluations
    feedback_state="with_user_feedback",  # Only calibrated items
)

Get evaluations by span

Fetch all judge evaluations for a specific span (useful when a span has been evaluated by multiple judges).
response = ze.get_span_evaluations(
    project_id="your-project-id",
    span_id="your-span-id",
)

for eval in response["evaluations"]:
    print(f"Judge: {eval['judge_name']}")
    print(f"Result: {'PASS' if eval['evaluation_result'] else 'FAIL'}")
    if eval.get('evaluation_type') == 'scored':
        print(f"Score: {eval['score']} / {eval['score_max']}")

REST API

Use these endpoints directly with your API key in the Authorization header.

Get available criteria for a judge

curl -X GET "https://api.zeroeval.com/projects/{project_id}/judges/{judge_id}/criteria" \
  -H "Authorization: Bearer $ZEROEVAL_API_KEY"

Get evaluations by judge

curl -X GET "https://api.zeroeval.com/projects/{project_id}/judges/{judge_id}/evaluations?limit=100&offset=0" \
  -H "Authorization: Bearer $ZEROEVAL_API_KEY"
Query parameters:
ParameterTypeDescription
limitintResults per page (1-500, default 100)
offsetintPagination offset (default 0)
start_datestringFilter by date (ISO 8601)
end_datestringFilter by date (ISO 8601)
evaluation_resultbooltrue for passing, false for failing
feedback_statestringwith_user_feedback or without_user_feedback

Get evaluations by span

curl -X GET "https://api.zeroeval.com/projects/{project_id}/spans/{span_id}/evaluations" \
  -H "Authorization: Bearer $ZEROEVAL_API_KEY"

Response format

Judge evaluations response

{
  "evaluations": [...],
  "total": 142,
  "limit": 100,
  "offset": 0
}

Judge criteria response

{
  "judge_id": "judge-uuid",
  "evaluation_type": "scored",
  "score_min": 0,
  "score_max": 5,
  "pass_threshold": 3.5,
  "criteria": [
    {
      "key": "CTA_text",
      "label": "CTA_text",
      "description": "CTA clarity and visibility"
    }
  ]
}

Span evaluations response

{
  "span_id": "abc-123",
  "evaluations": [...]
}

Evaluation object

FieldTypeDescription
idstringUnique evaluation ID
span_idstringThe evaluated span
evaluation_resultboolPass (true) or fail (false)
evaluation_reasonstringJudge’s reasoning
confidence_scorefloatModel confidence (0-1)
scorefloat | nullNumeric score (scored judges only)
score_minfloat | nullMinimum possible score
score_maxfloat | nullMaximum possible score
pass_thresholdfloat | nullScore required to pass
model_usedstringLLM model that ran the evaluation
created_atstringISO 8601 timestamp

Pagination example

For large result sets, paginate through all evaluations:
all_evaluations = []
offset = 0
limit = 100

while True:
    response = ze.get_judge_evaluations(
        project_id="your-project-id",
        judge_id="your-judge-id",
        limit=limit,
        offset=offset,
    )
    
    all_evaluations.extend(response["evaluations"])
    
    if len(response["evaluations"]) < limit:
        break
    
    offset += limit

print(f"Fetched {len(all_evaluations)} total evaluations")