Installation
Core Functions
init()
Initializes the ZeroEval SDK. Must be called before using any other SDK features.
api_key(str, optional): Your ZeroEval API key. If not provided, usesZEROEVAL_API_KEYenvironment variableworkspace_name(str, optional): The name of your workspace. Defaults to"Personal Workspace"debug(bool, optional): If True, enables detailed logging for debugging. Can also be enabled by settingZEROEVAL_DEBUG=trueenvironment variableapi_url(str, optional): The URL of the ZeroEval API. Defaults to"https://api.zeroeval.com"
Decorators
@span
Decorator and context manager for creating spans around code blocks.
name(str): Name of the spansession_id(str, optional): Deprecated - Usesessionparameter insteadsession(Union[str, dict], optional): Session information. Can be:- A string containing the session ID
- A dict with
{"id": "...", "name": "..."}
attributes(dict, optional): Additional attributes to attach to the spaninput_data(str, optional): Manual input data overrideoutput_data(str, optional): Manual output data overridetags(dict, optional): Tags to attach to the span
@experiment
Decorator that attaches dataset and model information to a function.
dataset(Dataset, optional): Dataset to use for the experimentmodel(str, optional): Model identifier
Classes
Dataset
A class to represent a named collection of dictionary records.
Constructor
name(str): The name of the datasetdata(list[dict]): A list of dictionaries containing the datadescription(str, optional): A description of the dataset
Methods
push()
Push the dataset to the backend, creating a new version if it already exists.
self: The Dataset instancecreate_new_version(bool, optional): For backward compatibility. This parameter is no longer needed as new versions are automatically created when a dataset name already exists. Defaults to False
pull()
Static method to pull a dataset from the backend.
cls: The Dataset class itself (automatically provided when using@classmethod)dataset_name(str): The name of the dataset to pull from the backendversion_number(int, optional): Specific version number to pull. If not provided, pulls the latest version
add_rows()
Add new rows to the dataset.
self: The Dataset instancenew_rows(list[dict]): A list of dictionaries representing the rows to add
add_image()
Add an image to a specific row.
self: The Dataset instancerow_index(int): Index of the row to update (0-based)column_name(str): Name of the column to add the image toimage_path(str): Path to the image file to add
add_audio()
Add audio to a specific row.
self: The Dataset instancerow_index(int): Index of the row to update (0-based)column_name(str): Name of the column to add the audio toaudio_path(str): Path to the audio file to add
add_media_url()
Add a media URL to a specific row.
self: The Dataset instancerow_index(int): Index of the row to update (0-based)column_name(str): Name of the column to add the media URL tomedia_url(str): URL pointing to the media filemedia_type(str, optional): Type of media - “image”, “audio”, or “video”. Defaults to “image”
Properties
name(str): The name of the datasetdescription(str): The description of the datasetcolumns(list[str]): List of all unique column namesdata(list[dict]): List of the data portion for each rowbackend_id(str): The ID in the backend (after pushing)version_id(str): The version ID in the backendversion_number(int): The version number in the backend
Example
Experiment
Represents an experiment that runs a task on a dataset with optional evaluators.
Constructor
dataset(Dataset): The dataset to run the experiment ontask(Callable): Function that processes each row and returns outputevaluators(list[Callable], optional): List of evaluator functions that take (row, output) and return evaluation resultname(str, optional): Name of the experiment. Defaults to task function namedescription(str, optional): Description of the experiment. Defaults to task function’s docstring
Methods
run()
Run the complete experiment (task + evaluators).
self: The Experiment instancesubset(list[dict], optional): Subset of dataset rows to run the experiment on. If None, runs on entire dataset
run_task()
Run only the task without evaluators.
self: The Experiment instancesubset(list[dict], optional): Subset of dataset rows to run the task on. If None, runs on entire datasetraise_on_error(bool, optional): If True, raises exceptions encountered during task execution. If False, captures errors. Defaults to False
run_evaluators()
Run evaluators on existing results.
self: The Experiment instanceevaluators(list[Callable], optional): List of evaluator functions to run. If None, uses evaluators from the Experiment instanceresults(list[ExperimentResult], optional): List of results to evaluate. If None, uses results from the Experiment instance
Span
Represents a span in the tracing system. Usually created via the @span decorator.
Methods
set_io()
Set input and output data for the span.
self: The Span instanceinput_data(str, optional): Input data to attach to the span. Will be converted to string if not alreadyoutput_data(str, optional): Output data to attach to the span. Will be converted to string if not already
set_tags()
Set tags on the span.
self: The Span instancetags(dict[str, str]): Dictionary of tags to set on the span
set_attributes()
Set attributes on the span.
self: The Span instanceattributes(dict[str, Any]): Dictionary of attributes to set on the span
set_error()
Set error information for the span.
self: The Span instancecode(str): Error code or exception class namemessage(str): Error messagestack(str, optional): Stack trace information
Context Functions
get_current_span()
Returns the currently active span, if any.
get_current_trace()
Returns the current trace ID.
get_current_session()
Returns the current session ID.
set_tag()
Sets tags on a span, trace, or session.
target: The target to set tags onSpan: Sets tags on the specific spanstr: Sets tags on the trace (if valid trace ID) or session (if valid session ID)
tags(dict[str, str]): Dictionary of tags to set
set_signal()
Send a signal to a span, trace, or session.
target: The entity to attach signals toSpan: Sends signals to the specific spanstr: Sends signals to the trace (if active trace ID) or session
signals(dict): Dictionary of signal names to values
CLI Commands
The ZeroEval SDK includes a CLI tool for running experiments and setup.zeroeval run
Run a Python script containing ZeroEval experiments.
zeroeval setup
Interactive setup to configure API credentials.
Environment Variables
The SDK uses the following environment variables:ZEROEVAL_API_KEY: Your ZeroEval API keyZEROEVAL_API_URL: API endpoint URL (defaults tohttps://api.zeroeval.com)ZEROEVAL_DEBUG: Set totrueto enable debug loggingZEROEVAL_DISABLED_INTEGRATIONS: Comma-separated list of integrations to disable