Installation
Core Functions
init()
Initializes the ZeroEval SDK. Must be called before using any other SDK features.
api_key(str, optional): Your ZeroEval API key. If not provided, usesZEROEVAL_API_KEYenvironment variableworkspace_name(str, optional): The name of your workspace. Defaults to"Personal Workspace"debug(bool, optional): If True, enables detailed logging for debugging. Can also be enabled by settingZEROEVAL_DEBUG=trueenvironment variableapi_url(str, optional): The URL of the ZeroEval API. Defaults to"https://api.zeroeval.com"
Decorators
@span
Decorator and context manager for creating spans around code blocks.
name(str): Name of the spansession_id(str, optional): Deprecated - Usesessionparameter insteadsession(Union[str, dict], optional): Session information. Can be:- A string containing the session ID
- A dict with
{"id": "...", "name": "..."}
attributes(dict, optional): Additional attributes to attach to the spaninput_data(str, optional): Manual input data overrideoutput_data(str, optional): Manual output data overridetags(dict, optional): Tags to attach to the span
@experiment
Decorator that attaches dataset and model information to a function.
dataset(Dataset, optional): Dataset to use for the experimentmodel(str, optional): Model identifier
Classes
Dataset
A class to represent a named collection of dictionary records.
Constructor
name(str): The name of the datasetdata(list[dict]): A list of dictionaries containing the datadescription(str, optional): A description of the dataset
Methods
push()
Push the dataset to the backend, creating a new version if it already exists.
self: The Dataset instancecreate_new_version(bool, optional): For backward compatibility. This parameter is no longer needed as new versions are automatically created when a dataset name already exists. Defaults to False
pull()
Static method to pull a dataset from the backend.
cls: The Dataset class itself (automatically provided when using@classmethod)dataset_name(str): The name of the dataset to pull from the backendversion_number(int, optional): Specific version number to pull. If not provided, pulls the latest version
add_rows()
Add new rows to the dataset.
self: The Dataset instancenew_rows(list[dict]): A list of dictionaries representing the rows to add
add_image()
Add an image to a specific row.
self: The Dataset instancerow_index(int): Index of the row to update (0-based)column_name(str): Name of the column to add the image toimage_path(str): Path to the image file to add
add_audio()
Add audio to a specific row.
self: The Dataset instancerow_index(int): Index of the row to update (0-based)column_name(str): Name of the column to add the audio toaudio_path(str): Path to the audio file to add
add_media_url()
Add a media URL to a specific row.
self: The Dataset instancerow_index(int): Index of the row to update (0-based)column_name(str): Name of the column to add the media URL tomedia_url(str): URL pointing to the media filemedia_type(str, optional): Type of media - “image”, “audio”, or “video”. Defaults to “image”
Properties
name(str): The name of the datasetdescription(str): The description of the datasetcolumns(list[str]): List of all unique column namesdata(list[dict]): List of the data portion for each rowbackend_id(str): The ID in the backend (after pushing)version_id(str): The version ID in the backendversion_number(int): The version number in the backend
Example
Experiment
Represents an experiment that runs a task on a dataset with optional evaluators.
Constructor
dataset(Dataset): The dataset to run the experiment ontask(Callable): Function that processes each row and returns outputevaluators(list[Callable], optional): List of evaluator functions that take (row, output) and return evaluation resultname(str, optional): Name of the experiment. Defaults to task function namedescription(str, optional): Description of the experiment. Defaults to task function’s docstring
Methods
run()
Run the complete experiment (task + evaluators).
self: The Experiment instancesubset(list[dict], optional): Subset of dataset rows to run the experiment on. If None, runs on entire dataset
run_task()
Run only the task without evaluators.
self: The Experiment instancesubset(list[dict], optional): Subset of dataset rows to run the task on. If None, runs on entire datasetraise_on_error(bool, optional): If True, raises exceptions encountered during task execution. If False, captures errors. Defaults to False
run_evaluators()
Run evaluators on existing results.
self: The Experiment instanceevaluators(list[Callable], optional): List of evaluator functions to run. If None, uses evaluators from the Experiment instanceresults(list[ExperimentResult], optional): List of results to evaluate. If None, uses results from the Experiment instance
Span
Represents a span in the tracing system. Usually created via the @span decorator.
Methods
set_io()
Set input and output data for the span.
self: The Span instanceinput_data(str, optional): Input data to attach to the span. Will be converted to string if not alreadyoutput_data(str, optional): Output data to attach to the span. Will be converted to string if not already
set_tags()
Set tags on the span.
self: The Span instancetags(dict[str, str]): Dictionary of tags to set on the span
set_attributes()
Set attributes on the span.
self: The Span instanceattributes(dict[str, Any]): Dictionary of attributes to set on the span
set_error()
Set error information for the span.
self: The Span instancecode(str): Error code or exception class namemessage(str): Error messagestack(str, optional): Stack trace information
add_screenshot()
Attach a screenshot to the span for visual evaluation by LLM judges. Screenshots are uploaded during ingestion and can be evaluated alongside text data.
self: The Span instancebase64_data(str): Base64 encoded image data. Accepts raw base64 or data URL format (data:image/png;base64,...)viewport(str, optional): Viewport type -"desktop","mobile", or"tablet". Defaults to"desktop"width(int, optional): Image width in pixelsheight(int, optional): Image height in pixelslabel(str, optional): Human-readable description of the screenshot
add_image()
Attach a generic image to the span for visual evaluation. Use this for non-screenshot images like charts, diagrams, or UI component states.
self: The Span instancebase64_data(str): Base64 encoded image data. Accepts raw base64 or data URL formatlabel(str, optional): Human-readable description of the imagemetadata(dict, optional): Additional metadata to store with the image
Images attached to spans can be evaluated by LLM judges configured for multimodal evaluation. See the Multimodal Evaluation guide for setup instructions.
Context Functions
get_current_span()
Returns the currently active span, if any.
get_current_trace()
Returns the current trace ID.
get_current_session()
Returns the current session ID.
set_tag()
Sets tags on a span, trace, or session.
target: The target to set tags onSpan: Sets tags on the specific spanstr: Sets tags on the trace (if valid trace ID) or session (if valid session ID)
tags(dict[str, str]): Dictionary of tags to set
set_signal()
Send a signal to a span, trace, or session.
target: The entity to attach signals toSpan: Sends signals to the specific spanstr: Sends signals to the trace (if active trace ID) or session
signals(dict): Dictionary of signal names to values
CLI Commands
The ZeroEval SDK includes a CLI tool for running experiments and setup.zeroeval run
Run a Python script containing ZeroEval experiments.
zeroeval setup
Interactive setup to configure API credentials.
Environment Variables
The SDK uses the following environment variables:ZEROEVAL_API_KEY: Your ZeroEval API keyZEROEVAL_API_URL: API endpoint URL (defaults tohttps://api.zeroeval.com)ZEROEVAL_DEBUG: Set totrueto enable debug loggingZEROEVAL_DISABLED_INTEGRATIONS: Comma-separated list of integrations to disable