Skip to main content

Overview

When calibrating judges, you can submit feedback programmatically using the SDK. This is useful for:
  • Bulk feedback submission from automated pipelines
  • Integration with custom review workflows
  • Syncing feedback from external labeling tools

Important: Using the Correct IDs

Judge evaluations involve two related spans:
IDDescription
Source Span IDThe original LLM call that was evaluated
Judge Call Span IDThe span created when the judge ran its evaluation
When submitting feedback, always include the judge_id parameter to ensure feedback is correctly associated with the judge evaluation.

Python SDK

The easiest way to get the correct IDs is from the Judge Evaluation modal:
  1. Open a judge evaluation in the dashboard
  2. Expand the “SDK Integration” section
  3. Click “Copy” to copy the pre-filled Python code
  4. Paste and customize the generated code

Manual Submission

from zeroeval import ZeroEval

client = ZeroEval()

# Submit feedback for a judge evaluation
client.send_feedback(
    prompt_slug="your-judge-task-slug",  # The task/prompt associated with the judge
    completion_id="span-id-here",         # The span ID from the evaluation
    thumbs_up=True,                        # True = correct, False = incorrect
    reason="Optional explanation",
    judge_id="automation-id-here",         # Required for judge feedback
)

Parameters

ParameterTypeRequiredDescription
prompt_slugstrYesThe task slug associated with the judge
completion_idstrYesThe span ID being evaluated
thumbs_upboolYesTrue if judge was correct, False if wrong
reasonstrNoExplanation of the feedback
judge_idstrYes*The judge automation ID (*required for judge feedback)
expected_scorefloatNoFor scored judges: the expected score value
score_directionstrNoFor scored judges: "too_high" or "too_low"
expected_score and score_direction are only valid for scored judges (judges with evaluation_type: "scored"). The API will return a 400 error if these fields are provided for binary judges.

Score-Based Feedback

For judges using scored rubrics (not binary pass/fail), you can provide additional feedback about the expected score:
from zeroeval import ZeroEval

client = ZeroEval()

# Submit feedback for a scored judge evaluation
client.send_feedback(
    prompt_slug="quality-scorer",
    completion_id="span-id-here",
    thumbs_up=False,                       # The judge was incorrect
    judge_id="automation-id-here",
    expected_score=3.5,                    # What the score should have been
    score_direction="too_high",            # The judge scored too high
    reason="Score should have been lower due to grammar issues",
)

REST API

Binary Judge Feedback

curl -X POST "https://api.zeroeval.com/v1/prompts/{task_slug}/completions/{span_id}/feedback" \
  -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "thumbs_up": true,
    "reason": "Judge correctly identified the issue",
    "judge_id": "automation-uuid-here"
  }'

Scored Judge Feedback

For scored judges, include expected_score and score_direction:
curl -X POST "https://api.zeroeval.com/v1/prompts/{task_slug}/completions/{span_id}/feedback" \
  -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "thumbs_up": false,
    "reason": "Score should have been lower",
    "judge_id": "automation-uuid-here",
    "expected_score": 3.5,
    "score_direction": "too_high"
  }'

Finding Your IDs

IDWhere to Find It
Task SlugIn the judge settings, or the URL when editing the judge’s prompt
Span IDIn the evaluation modal, or via get_judge_evaluations() response
Judge IDIn the URL when viewing a judge (/judges/{judge_id})

Bulk Feedback Submission

For submitting feedback on multiple evaluations, you can iterate through evaluations:
from zeroeval import ZeroEval

client = ZeroEval()

# Get evaluations to review
evaluations = client.get_judge_evaluations(
    project_id="your-project-id",
    judge_id="your-judge-id",
    limit=100,
)

# Submit feedback for each
for eval in evaluations["evaluations"]:
    # Your logic to determine if the evaluation was correct
    is_correct = your_review_logic(eval)
    
    client.send_feedback(
        prompt_slug="your-judge-task-slug",
        completion_id=eval["span_id"],
        thumbs_up=is_correct,
        reason="Automated review",
        judge_id="your-judge-id",
    )