Overview
When calibrating judges, you can submit feedback programmatically using the SDK.
This is useful for:
- Bulk feedback submission from automated pipelines
- Integration with custom review workflows
- Syncing feedback from external labeling tools
Important: Using the Correct IDs
Judge evaluations involve two related spans:
| ID | Description |
|---|
| Source Span ID | The original LLM call that was evaluated |
| Judge Call Span ID | The span created when the judge ran its evaluation |
When submitting feedback, always include the judge_id parameter to ensure
feedback is correctly associated with the judge evaluation.
Python SDK
From the UI (Recommended)
The easiest way to get the correct IDs is from the Judge Evaluation modal:
- Open a judge evaluation in the dashboard
- Expand the “SDK Integration” section
- Click “Copy” to copy the pre-filled Python code
- Paste and customize the generated code
Manual Submission
from zeroeval import ZeroEval
client = ZeroEval()
# Submit feedback for a judge evaluation
client.send_feedback(
prompt_slug="your-judge-task-slug", # The task/prompt associated with the judge
completion_id="span-id-here", # The span ID from the evaluation
thumbs_up=True, # True = correct, False = incorrect
reason="Optional explanation",
judge_id="automation-id-here", # Required for judge feedback
)
Parameters
| Parameter | Type | Required | Description |
|---|
prompt_slug | str | Yes | The task slug associated with the judge |
completion_id | str | Yes | The span ID being evaluated |
thumbs_up | bool | Yes | True if judge was correct, False if wrong |
reason | str | No | Explanation of the feedback |
judge_id | str | Yes* | The judge automation ID (*required for judge feedback) |
expected_score | float | No | For scored judges: the expected score value |
score_direction | str | No | For scored judges: "too_high" or "too_low" |
expected_score and score_direction are only valid for scored judges
(judges with evaluation_type: "scored"). The API will return a 400 error
if these fields are provided for binary judges.
Score-Based Feedback
For judges using scored rubrics (not binary pass/fail), you can provide additional
feedback about the expected score:
from zeroeval import ZeroEval
client = ZeroEval()
# Submit feedback for a scored judge evaluation
client.send_feedback(
prompt_slug="quality-scorer",
completion_id="span-id-here",
thumbs_up=False, # The judge was incorrect
judge_id="automation-id-here",
expected_score=3.5, # What the score should have been
score_direction="too_high", # The judge scored too high
reason="Score should have been lower due to grammar issues",
)
REST API
Binary Judge Feedback
curl -X POST "https://api.zeroeval.com/v1/prompts/{task_slug}/completions/{span_id}/feedback" \
-H "Authorization: Bearer $ZEROEVAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"thumbs_up": true,
"reason": "Judge correctly identified the issue",
"judge_id": "automation-uuid-here"
}'
Scored Judge Feedback
For scored judges, include expected_score and score_direction:
curl -X POST "https://api.zeroeval.com/v1/prompts/{task_slug}/completions/{span_id}/feedback" \
-H "Authorization: Bearer $ZEROEVAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"thumbs_up": false,
"reason": "Score should have been lower",
"judge_id": "automation-uuid-here",
"expected_score": 3.5,
"score_direction": "too_high"
}'
Finding Your IDs
| ID | Where to Find It |
|---|
| Task Slug | In the judge settings, or the URL when editing the judge’s prompt |
| Span ID | In the evaluation modal, or via get_judge_evaluations() response |
| Judge ID | In the URL when viewing a judge (/judges/{judge_id}) |
Bulk Feedback Submission
For submitting feedback on multiple evaluations, you can iterate through evaluations:
from zeroeval import ZeroEval
client = ZeroEval()
# Get evaluations to review
evaluations = client.get_judge_evaluations(
project_id="your-project-id",
judge_id="your-judge-id",
limit=100,
)
# Submit feedback for each
for eval in evaluations["evaluations"]:
# Your logic to determine if the evaluation was correct
is_correct = your_review_logic(eval)
client.send_feedback(
prompt_slug="your-judge-task-slug",
completion_id=eval["span_id"],
thumbs_up=is_correct,
reason="Automated review",
judge_id="your-judge-id",
)