Installation
Core Functions
init()
Initializes the ZeroEval SDK. Must be called before using any other SDK features.
def init(
api_key: str = None,
workspace_name: str = "Personal Organization",
organization_name: str = None,
debug: bool = False,
api_url: str = None,
disabled_integrations: list[str] = None,
enabled_integrations: list[str] = None,
setup_otlp: bool = True,
service_name: str = "zeroeval-app",
tags: dict[str, str] = None,
sampling_rate: float = None
) -> None
| Parameter | Type | Default | Description |
|---|
api_key | str | None | API key. Falls back to ZEROEVAL_API_KEY env var |
workspace_name | str | "Personal Organization" | Deprecated — use organization_name |
organization_name | str | None | Organization name |
debug | bool | False | Enable debug logging with colors |
api_url | str | "https://api.zeroeval.com" | API endpoint URL |
disabled_integrations | list[str] | None | Integrations to disable (e.g. ["langchain"]) |
enabled_integrations | list[str] | None | Only enable these integrations |
setup_otlp | bool | True | Configure OpenTelemetry OTLP export |
service_name | str | "zeroeval-app" | OTLP service name |
tags | dict[str, str] | None | Global tags applied to all spans |
sampling_rate | float | None | Sampling rate 0.0-1.0 (1.0 = sample all) |
Example:
import zeroeval as ze
ze.init(
api_key="your-api-key",
sampling_rate=0.1,
disabled_integrations=["langchain"],
debug=True
)
Decorators
@span
Decorator and context manager for creating spans around code blocks.
@span(
name: str,
session_id: Optional[str] = None,
session: Optional[Union[str, dict[str, str]]] = None,
attributes: Optional[dict[str, Any]] = None,
input_data: Optional[str] = None,
output_data: Optional[str] = None,
tags: Optional[dict[str, str]] = None
)
Parameters:
name (str): Name of the span
session_id (str, optional): Deprecated - Use session parameter instead
session (Union[str, dict], optional): Session information. Can be:
- A string containing the session ID
- A dict with
{"id": "...", "name": "..."}
attributes (dict, optional): Additional attributes to attach to the span
input_data (str, optional): Manual input data override
output_data (str, optional): Manual output data override
tags (dict, optional): Tags to attach to the span
Usage as Decorator:
import zeroeval as ze
@ze.span(name="calculate_sum")
def add_numbers(a: int, b: int) -> int:
return a + b # Parameters and return value automatically captured
# With manual I/O
@ze.span(name="process_data", input_data="manual input", output_data="manual output")
def process():
# Process logic here
pass
# With session
@ze.span(name="user_action", session={"id": "123", "name": "John's Session"})
def user_action():
pass
Usage as Context Manager:
import zeroeval as ze
with ze.span(name="data_processing") as current_span:
result = process_data()
current_span.set_io(input_data="input", output_data=str(result))
artifact_span
Ergonomic wrapper for creating artifact-bearing spans. Produces a span with kind="llm" and the completion_artifact_* attributes pre-filled so the prompt completions page surfaces it as a first-class artifact.
artifact_span(
name: str,
*,
artifact_type: str,
role: str = "primary",
label: Optional[str] = None,
kind: str = "llm",
session_id: Optional[str] = None,
session: Optional[Union[str, dict[str, str]]] = None,
attributes: Optional[dict[str, Any]] = None,
input_data: Optional[str] = None,
output_data: Optional[str] = None,
tags: Optional[dict[str, str]] = None,
)
Parameters:
name (str): Name of the span
artifact_type (str): Artifact type identifier (e.g. "final_decision", "customer_card")
role (str): "primary" (used for row preview) or "secondary". Defaults to "primary"
label (str, optional): Human-friendly label shown in the artifact switcher. Defaults to the span name
kind (str): Span kind. Defaults to "llm"
session (Union[str, dict], optional): Session information
attributes (dict, optional): Additional attributes merged with artifact metadata. Artifact keys take precedence
input_data (str, optional): Manual input data override
output_data (str, optional): Manual output data override
tags (dict, optional): Tags to attach to the span
Usage as Context Manager:
import zeroeval as ze
with ze.artifact_span(
name="final-decision",
artifact_type="final_decision",
role="primary",
label="Final Decision",
tags={"judge_target": "support_ops_final_decision"},
) as s:
s.set_io(input_data="ticket text", output_data=decision_json)
Usage as Decorator:
@ze.artifact_span(
name="generate-card",
artifact_type="customer_card",
role="secondary",
label="Customer Card",
)
def generate_card(ticket):
return render_card(ticket)
ze.artifact_span is available in the Python SDK only for now.
@experiment
Decorator that attaches dataset and model information to a function.
@experiment(
dataset: Optional[Dataset] = None,
model: Optional[str] = None
)
Parameters:
dataset (Dataset, optional): Dataset to use for the experiment
model (str, optional): Model identifier
Example:
import zeroeval as ze
dataset = ze.Dataset.pull("my-dataset")
@ze.experiment(dataset=dataset, model="gpt-4")
def my_experiment():
# Experiment logic
pass
Classes
Dataset
A class to represent a named collection of dictionary records.
Constructor
Dataset(
name: str,
data: list[dict[str, Any]],
description: Optional[str] = None
)
Parameters:
name (str): The name of the dataset
data (list[dict]): A list of dictionaries containing the data
description (str, optional): A description of the dataset
Example:
dataset = Dataset(
name="Capitals",
description="Country to capital mapping",
data=[
{"input": "France", "output": "Paris"},
{"input": "Germany", "output": "Berlin"}
]
)
Methods
push()
Push the dataset to the backend, creating a new version if it already exists.
def push(self, create_new_version: bool = False) -> Dataset
Parameters:
self: The Dataset instance
create_new_version (bool, optional): For backward compatibility. This parameter is no longer needed as new versions are automatically created when a dataset name already exists. Defaults to False
Returns: Returns self for method chaining
pull()
Static method to pull a dataset from the backend.
@classmethod
def pull(
cls,
dataset_name: str,
version_number: Optional[int] = None
) -> Dataset
Parameters:
cls: The Dataset class itself (automatically provided when using @classmethod)
dataset_name (str): The name of the dataset to pull from the backend
version_number (int, optional): Specific version number to pull. If not provided, pulls the latest version
Returns: A new Dataset instance populated with data from the backend
add_rows()
Add new rows to the dataset.
def add_rows(self, new_rows: list[dict[str, Any]]) -> None
Parameters:
self: The Dataset instance
new_rows (list[dict]): A list of dictionaries representing the rows to add
add_image()
Add an image to a specific row.
def add_image(
self,
row_index: int,
column_name: str,
image_path: str
) -> None
Parameters:
self: The Dataset instance
row_index (int): Index of the row to update (0-based)
column_name (str): Name of the column to add the image to
image_path (str): Path to the image file to add
add_audio()
Add audio to a specific row.
def add_audio(
self,
row_index: int,
column_name: str,
audio_path: str
) -> None
Parameters:
self: The Dataset instance
row_index (int): Index of the row to update (0-based)
column_name (str): Name of the column to add the audio to
audio_path (str): Path to the audio file to add
add_media_url()
Add a media URL to a specific row.
def add_media_url(
self,
row_index: int,
column_name: str,
media_url: str,
media_type: str = "image"
) -> None
Parameters:
self: The Dataset instance
row_index (int): Index of the row to update (0-based)
column_name (str): Name of the column to add the media URL to
media_url (str): URL pointing to the media file
media_type (str, optional): Type of media - “image”, “audio”, or “video”. Defaults to “image”
Properties
name (str): The name of the dataset
description (str): The description of the dataset
columns (list[str]): List of all unique column names
data (list[dict]): List of the data portion for each row
backend_id (str): The ID in the backend (after pushing)
version_id (str): The version ID in the backend
version_number (int): The version number in the backend
Example
import zeroeval as ze
# Create a dataset
dataset = ze.Dataset(
name="Capitals",
description="Country to capital mapping",
data=[
{"input": "France", "output": "Paris"},
{"input": "Germany", "output": "Berlin"}
]
)
# Push to backend
dataset.push()
# Pull from backend
dataset = ze.Dataset.pull("Capitals", version_number=1)
# Add rows
dataset.add_rows([{"input": "Italy", "output": "Rome"}])
# Add multimodal data
dataset.add_image(0, "flag", "flags/france.png")
dataset.add_audio(0, "anthem", "anthems/france.mp3")
dataset.add_media_url(0, "video_url", "https://example.com/video.mp4", "video")
Experiment
Represents an experiment that runs a task on a dataset with optional evaluators.
Constructor
Experiment(
dataset: Dataset,
task: Callable[[Any], Any],
evaluators: Optional[list[Callable[[Any, Any], Any]]] = None,
name: Optional[str] = None,
description: Optional[str] = None
)
Parameters:
dataset (Dataset): The dataset to run the experiment on
task (Callable): Function that processes each row and returns output
evaluators (list[Callable], optional): List of evaluator functions that take (row, output) and return evaluation result
name (str, optional): Name of the experiment. Defaults to task function name
description (str, optional): Description of the experiment. Defaults to task function’s docstring
Example:
import zeroeval as ze
ze.init()
# Pull dataset
dataset = ze.Dataset.pull("Capitals")
# Define task
def capitalize_task(row):
return row["input"].upper()
# Define evaluator
def exact_match(row, output):
return row["output"].upper() == output
# Create and run experiment
exp = ze.Experiment(
dataset=dataset,
task=capitalize_task,
evaluators=[exact_match],
name="Capital Uppercase Test"
)
results = exp.run()
# Or run task and evaluators separately
results = exp.run_task()
exp.run_evaluators([exact_match], results)
Methods
run()
Run the complete experiment (task + evaluators).
def run(
self,
subset: Optional[list[dict]] = None
) -> list[ExperimentResult]
Parameters:
self: The Experiment instance
subset (list[dict], optional): Subset of dataset rows to run the experiment on. If None, runs on entire dataset
Returns: List of experiment results for each row
run_task()
Run only the task without evaluators.
def run_task(
self,
subset: Optional[list[dict]] = None,
raise_on_error: bool = False
) -> list[ExperimentResult]
Parameters:
self: The Experiment instance
subset (list[dict], optional): Subset of dataset rows to run the task on. If None, runs on entire dataset
raise_on_error (bool, optional): If True, raises exceptions encountered during task execution. If False, captures errors. Defaults to False
Returns: List of experiment results for each row
run_evaluators()
Run evaluators on existing results.
def run_evaluators(
self,
evaluators: Optional[list[Callable[[Any, Any], Any]]] = None,
results: Optional[list[ExperimentResult]] = None
) -> list[ExperimentResult]
Parameters:
self: The Experiment instance
evaluators (list[Callable], optional): List of evaluator functions to run. If None, uses evaluators from the Experiment instance
results (list[ExperimentResult], optional): List of results to evaluate. If None, uses results from the Experiment instance
Returns: The evaluated results
Span
Represents a span in the tracing system. Usually created via the @span decorator.
Methods
set_io()
Set input and output data for the span.
def set_io(
self,
input_data: Optional[str] = None,
output_data: Optional[str] = None
) -> None
Parameters:
self: The Span instance
input_data (str, optional): Input data to attach to the span. Will be converted to string if not already
output_data (str, optional): Output data to attach to the span. Will be converted to string if not already
set_tags()
Set tags on the span.
def set_tags(self, tags: dict[str, str]) -> None
Parameters:
self: The Span instance
tags (dict[str, str]): Dictionary of tags to set on the span
set_attributes()
Set attributes on the span.
def set_attributes(self, attributes: dict[str, Any]) -> None
Parameters:
self: The Span instance
attributes (dict[str, Any]): Dictionary of attributes to set on the span
set_error()
Set error information for the span.
def set_error(
self,
code: str,
message: str,
stack: Optional[str] = None
) -> None
Parameters:
self: The Span instance
code (str): Error code or exception class name
message (str): Error message
stack (str, optional): Stack trace information
add_screenshot()
Attach a screenshot to the span for visual evaluation by LLM judges. Screenshots are uploaded during ingestion and can be evaluated alongside text data.
def add_screenshot(
self,
base64_data: str,
viewport: str = "desktop",
width: Optional[int] = None,
height: Optional[int] = None,
label: Optional[str] = None
) -> None
Parameters:
self: The Span instance
base64_data (str): Base64 encoded image data. Accepts raw base64 or data URL format (data:image/png;base64,...)
viewport (str, optional): Viewport type - "desktop", "mobile", or "tablet". Defaults to "desktop"
width (int, optional): Image width in pixels
height (int, optional): Image height in pixels
label (str, optional): Human-readable description of the screenshot
Example:
import zeroeval as ze
with ze.span(name="browser_test", tags={"test": "visual"}) as span:
# Capture and attach a desktop screenshot
span.add_screenshot(
base64_data=desktop_screenshot_base64,
viewport="desktop",
width=1920,
height=1080,
label="Homepage - Desktop"
)
# Also capture mobile view
span.add_screenshot(
base64_data=mobile_screenshot_base64,
viewport="mobile",
width=375,
height=812,
label="Homepage - iPhone"
)
span.set_io(
input_data="Navigate to homepage",
output_data="Captured viewport screenshots"
)
add_image()
Attach a generic image to the span for visual evaluation. Use this for non-screenshot images like charts, diagrams, or UI component states.
def add_image(
self,
base64_data: str,
label: Optional[str] = None,
metadata: Optional[dict[str, Any]] = None
) -> None
Parameters:
self: The Span instance
base64_data (str): Base64 encoded image data. Accepts raw base64 or data URL format
label (str, optional): Human-readable description of the image
metadata (dict, optional): Additional metadata to store with the image
Example:
import zeroeval as ze
with ze.span(name="chart_generation") as span:
# Generate a chart and attach it
chart_base64 = generate_chart(data)
span.add_image(
base64_data=chart_base64,
label="Monthly Revenue Chart",
metadata={"chart_type": "bar", "data_points": 12}
)
span.set_io(
input_data="Generate revenue chart for Q4",
output_data="Chart generated with 12 data points"
)
Attaching images via URL (S3 presigned or CDN)
If your images are already hosted externally, you can pass an HTTPS URL instead of base64 data. ZeroEval will download, validate, and copy the image into its own storage during ingestion.
Supported URL sources:
- S3 presigned URLs (
*.amazonaws.com with valid authentication parameters)
- CDN URLs from trusted domains
Attach URLs directly via attributes.attachments using the url key:
import boto3
import zeroeval as ze
# Option A: Presigned S3 URL
s3 = boto3.client("s3")
presigned_url = s3.generate_presigned_url(
"get_object",
Params={"Bucket": "my-bucket", "Key": "images/chart.png"},
ExpiresIn=300,
)
with ze.span(name="chart_generation") as span:
span.attributes["attachments"] = [
{
"type": "image",
"url": presigned_url,
"label": "Monthly Revenue Chart",
}
]
span.set_io(
input_data="Generate revenue chart for Q4",
output_data="Chart generated"
)
import zeroeval as ze
# Option B: CDN URL
cdn_url = "https://cdn.example.com/images/product-photo.png"
with ze.span(name="product_image_check") as span:
span.attributes["attachments"] = [
{
"type": "image",
"url": cdn_url,
"label": "Product listing photo",
}
]
span.set_io(
input_data="Check product image quality",
output_data="Image attached for evaluation"
)
Images attached to spans can be evaluated by LLM judges configured for
multimodal evaluation. See the Multimodal
Evaluation guide for setup instructions.
Context Functions
get_current_span()
Returns the currently active span, if any.
def get_current_span() -> Optional[Span]
Returns: The currently active Span instance, or None if no span is active
get_current_trace()
Returns the current trace ID.
def get_current_trace() -> Optional[str]
Returns: The current trace ID, or None if no trace is active
get_current_session()
Returns the current session ID.
def get_current_session() -> Optional[str]
Returns: The current session ID, or None if no session is active
set_tag()
Sets tags on a span, trace, or session.
def set_tag(
target: Union[Span, str],
tags: dict[str, str]
) -> None
Parameters:
target: The target to set tags on
Span: Sets tags on the specific span
str: Sets tags on the trace (if valid trace ID) or session (if valid session ID)
tags (dict[str, str]): Dictionary of tags to set
Example:
import zeroeval as ze
# Set tags on current span
current_span = ze.get_current_span()
if current_span:
ze.set_tag(current_span, {"user_id": "12345", "environment": "production"})
# Set tags on trace
trace_id = ze.get_current_trace()
if trace_id:
ze.set_tag(trace_id, {"version": "1.5"})
Judge Feedback APIs
send_feedback()
Programmatically submit user feedback for a completion or judge evaluation.
def send_feedback(
*,
prompt_slug: str,
completion_id: str,
thumbs_up: bool,
reason: Optional[str] = None,
expected_output: Optional[str] = None,
metadata: Optional[dict] = None,
judge_id: Optional[str] = None,
expected_score: Optional[float] = None,
score_direction: Optional[str] = None,
criteria_feedback: Optional[dict] = None
) -> dict
Notes:
- Existing usage without
criteria_feedback is unchanged.
criteria_feedback is optional and supported for scored judges.
judge_id is required when sending expected_score, score_direction, or criteria_feedback.
get_judge_criteria()
Fetch normalized criteria metadata for a judge (useful before criterion-level feedback).
def get_judge_criteria(
project_id: str,
judge_id: str
) -> dict
Returns:
judge_id
evaluation_type
score_min, score_max, pass_threshold
criteria (list of {key, label, description})
CLI Commands
The ZeroEval SDK includes a CLI tool for running experiments and setup.
zeroeval run
Run a Python script containing ZeroEval experiments.
zeroeval setup
Interactive setup to configure API credentials.
Environment Variables
Set before importing ZeroEval to configure default behavior.
| Variable | Type | Default | Description |
|---|
ZEROEVAL_API_KEY | string | "" | API key for authentication |
ZEROEVAL_API_URL | string | "https://api.zeroeval.com" | API endpoint URL |
ZEROEVAL_WORKSPACE_NAME | string | "Personal Workspace" | Workspace name |
ZEROEVAL_SESSION_ID | string | auto-generated | Session ID for grouping traces |
ZEROEVAL_SESSION_NAME | string | "" | Human-readable session name |
ZEROEVAL_SAMPLING_RATE | float | "1.0" | Sampling rate (0.0-1.0) |
ZEROEVAL_DISABLED_INTEGRATIONS | string | "" | Comma-separated integrations to disable |
ZEROEVAL_DEBUG | boolean | "false" | Enable debug logging |
export ZEROEVAL_API_KEY="ze_1234567890abcdef"
export ZEROEVAL_SAMPLING_RATE="0.1"
export ZEROEVAL_DEBUG="true"
Runtime Configuration
Configure after initialization via ze.tracer.configure().
| Parameter | Type | Default | Description |
|---|
flush_interval | float | 1.0 | Flush frequency in seconds |
max_spans | int | 20 | Buffer size before forced flush |
collect_code_details | bool | True | Capture code details in spans |
integrations | dict[str, bool] | {} | Enable/disable specific integrations |
sampling_rate | float | None | Sampling rate (0.0-1.0) |
ze.tracer.configure(
flush_interval=0.5,
max_spans=100,
sampling_rate=0.05,
integrations={"openai": True, "langchain": False}
)
Available Integrations
| Integration | Name | Auto-Instruments |
|---|
OpenAIIntegration | "openai" | OpenAI client calls |
GeminiIntegration | "gemini" | Google Gemini calls |
LangChainIntegration | "langchain" | LangChain components |
LangGraphIntegration | "langgraph" | LangGraph workflows |
HttpxIntegration | "httpx" | HTTPX requests |
VocodeIntegration | "vocode" | Vocode voice SDK |
Control integrations via:
- Environment:
ZEROEVAL_DISABLED_INTEGRATIONS="langchain,langgraph"
- Init:
disabled_integrations=["langchain"] or enabled_integrations=["openai"]
- Runtime:
ze.tracer.configure(integrations={"langchain": False})
Configuration Examples
Production
ze.init(
api_key="your_key",
sampling_rate=0.05,
debug=False,
disabled_integrations=["langchain"]
)
ze.tracer.configure(
flush_interval=0.5,
max_spans=100
)
Development
ze.init(
api_key="your_key",
debug=True,
sampling_rate=1.0
)
Memory-Optimized
ze.tracer.configure(
max_spans=5,
collect_code_details=False,
flush_interval=2.0
)