Reference - ZeroEval Documentation

Installation

pip install zeroeval

Core Functions

`init()`

Initializes the ZeroEval SDK. Must be called before using any other SDK features.

def init(
    api_key: str = None, 
    workspace_name: str = "Personal Workspace",
    debug: bool = False,
    api_url: str = "https://api.zeroeval.com"
) -> None

Parameters:

api_key (str, optional): Your ZeroEval API key. If not provided, uses ZEROEVAL_API_KEY environment variable
workspace_name (str, optional): The name of your workspace. Defaults to "Personal Workspace"
debug (bool, optional): If True, enables detailed logging for debugging. Can also be enabled by setting ZEROEVAL_DEBUG=true environment variable
api_url (str, optional): The URL of the ZeroEval API. Defaults to "https://api.zeroeval.com"

Example:

import zeroeval as ze

ze.init(
    api_key="your-api-key",
    workspace_name="My Workspace",
    debug=True
)

Decorators

`@span`

Decorator and context manager for creating spans around code blocks.

@span(
    name: str,
    session_id: Optional[str] = None,
    session: Optional[Union[str, dict[str, str]]] = None,
    attributes: Optional[dict[str, Any]] = None,
    input_data: Optional[str] = None,
    output_data: Optional[str] = None,
    tags: Optional[dict[str, str]] = None
)

Parameters:

name (str): Name of the span
session_id (str, optional): Deprecated - Use session parameter instead
session (Union[str, dict], optional): Session information. Can be:
- A string containing the session ID
- A dict with {"id": "...", "name": "..."}
attributes (dict, optional): Additional attributes to attach to the span
input_data (str, optional): Manual input data override
output_data (str, optional): Manual output data override
tags (dict, optional): Tags to attach to the span

Usage as Decorator:

import zeroeval as ze

@ze.span(name="calculate_sum")
def add_numbers(a: int, b: int) -> int:
    return a + b  # Parameters and return value automatically captured

# With manual I/O
@ze.span(name="process_data", input_data="manual input", output_data="manual output")
def process():
    # Process logic here
    pass

# With session
@ze.span(name="user_action", session={"id": "123", "name": "John's Session"})
def user_action():
    pass

Usage as Context Manager:

import zeroeval as ze

with ze.span(name="data_processing") as current_span:
    result = process_data()
    current_span.set_io(input_data="input", output_data=str(result))

`@experiment`

Decorator that attaches dataset and model information to a function.

@experiment(
    dataset: Optional[Dataset] = None,
    model: Optional[str] = None
)

Parameters:

dataset (Dataset, optional): Dataset to use for the experiment
model (str, optional): Model identifier

Example:

import zeroeval as ze

dataset = ze.Dataset.pull("my-dataset")

@ze.experiment(dataset=dataset, model="gpt-4")
def my_experiment():
    # Experiment logic
    pass

Classes

`Dataset`

A class to represent a named collection of dictionary records.

Constructor

Dataset(
    name: str,
    data: list[dict[str, Any]],
    description: Optional[str] = None
)

Parameters:

name (str): The name of the dataset
data (list[dict]): A list of dictionaries containing the data
description (str, optional): A description of the dataset

Example:

dataset = Dataset(
    name="Capitals",
    description="Country to capital mapping",
    data=[
        {"input": "France", "output": "Paris"},
        {"input": "Germany", "output": "Berlin"}
    ]
)

Methods

`push()`

Push the dataset to the backend, creating a new version if it already exists.

def push(self, create_new_version: bool = False) -> Dataset

Parameters:

self: The Dataset instance
create_new_version (bool, optional): For backward compatibility. This parameter is no longer needed as new versions are automatically created when a dataset name already exists. Defaults to False

Returns: Returns self for method chaining

`pull()`

Static method to pull a dataset from the backend.

@classmethod
def pull(
    cls,
    dataset_name: str,
    version_number: Optional[int] = None
) -> Dataset

Parameters:

cls: The Dataset class itself (automatically provided when using @classmethod)
dataset_name (str): The name of the dataset to pull from the backend
version_number (int, optional): Specific version number to pull. If not provided, pulls the latest version

Returns: A new Dataset instance populated with data from the backend

`add_rows()`

Add new rows to the dataset.

def add_rows(self, new_rows: list[dict[str, Any]]) -> None

Parameters:

self: The Dataset instance
new_rows (list[dict]): A list of dictionaries representing the rows to add

`add_image()`

Add an image to a specific row.

def add_image(
    self,
    row_index: int,
    column_name: str,
    image_path: str
) -> None

Parameters:

self: The Dataset instance
row_index (int): Index of the row to update (0-based)
column_name (str): Name of the column to add the image to
image_path (str): Path to the image file to add

`add_audio()`

Add audio to a specific row.

def add_audio(
    self,
    row_index: int,
    column_name: str,
    audio_path: str
) -> None

Parameters:

self: The Dataset instance
row_index (int): Index of the row to update (0-based)
column_name (str): Name of the column to add the audio to
audio_path (str): Path to the audio file to add

`add_media_url()`

Add a media URL to a specific row.

def add_media_url(
    self,
    row_index: int,
    column_name: str,
    media_url: str,
    media_type: str = "image"
) -> None

Parameters:

self: The Dataset instance
row_index (int): Index of the row to update (0-based)
column_name (str): Name of the column to add the media URL to
media_url (str): URL pointing to the media file
media_type (str, optional): Type of media - “image”, “audio”, or “video”. Defaults to “image”

Properties

name (str): The name of the dataset
description (str): The description of the dataset
columns (list[str]): List of all unique column names
data (list[dict]): List of the data portion for each row
backend_id (str): The ID in the backend (after pushing)
version_id (str): The version ID in the backend
version_number (int): The version number in the backend

Example

import zeroeval as ze

# Create a dataset
dataset = ze.Dataset(
    name="Capitals",
    description="Country to capital mapping",
    data=[
        {"input": "France", "output": "Paris"},
        {"input": "Germany", "output": "Berlin"}
    ]
)

# Push to backend
dataset.push()

# Pull from backend
dataset = ze.Dataset.pull("Capitals", version_number=1)

# Add rows
dataset.add_rows([{"input": "Italy", "output": "Rome"}])

# Add multimodal data
dataset.add_image(0, "flag", "flags/france.png")
dataset.add_audio(0, "anthem", "anthems/france.mp3")
dataset.add_media_url(0, "video_url", "https://example.com/video.mp4", "video")

`Experiment`

Represents an experiment that runs a task on a dataset with optional evaluators.

Constructor

Experiment(
    dataset: Dataset,
    task: Callable[[Any], Any],
    evaluators: Optional[list[Callable[[Any, Any], Any]]] = None,
    name: Optional[str] = None,
    description: Optional[str] = None
)

Parameters:

dataset (Dataset): The dataset to run the experiment on
task (Callable): Function that processes each row and returns output
evaluators (list[Callable], optional): List of evaluator functions that take (row, output) and return evaluation result
name (str, optional): Name of the experiment. Defaults to task function name
description (str, optional): Description of the experiment. Defaults to task function’s docstring

Example:

import zeroeval as ze

ze.init()

# Pull dataset
dataset = ze.Dataset.pull("Capitals")

# Define task
def capitalize_task(row):
    return row["input"].upper()

# Define evaluator
def exact_match(row, output):
    return row["output"].upper() == output

# Create and run experiment
exp = ze.Experiment(
    dataset=dataset,
    task=capitalize_task,
    evaluators=[exact_match],
    name="Capital Uppercase Test"
)

results = exp.run()

# Or run task and evaluators separately
results = exp.run_task()
exp.run_evaluators([exact_match], results)

Methods

`run()`

Run the complete experiment (task + evaluators).

def run(
    self,
    subset: Optional[list[dict]] = None
) -> list[ExperimentResult]

Parameters:

self: The Experiment instance
subset (list[dict], optional): Subset of dataset rows to run the experiment on. If None, runs on entire dataset

Returns: List of experiment results for each row

`run_task()`

Run only the task without evaluators.

def run_task(
    self,
    subset: Optional[list[dict]] = None,
    raise_on_error: bool = False
) -> list[ExperimentResult]

Parameters:

self: The Experiment instance
subset (list[dict], optional): Subset of dataset rows to run the task on. If None, runs on entire dataset
raise_on_error (bool, optional): If True, raises exceptions encountered during task execution. If False, captures errors. Defaults to False

Returns: List of experiment results for each row

`run_evaluators()`

Run evaluators on existing results.

def run_evaluators(
    self,
    evaluators: Optional[list[Callable[[Any, Any], Any]]] = None,
    results: Optional[list[ExperimentResult]] = None
) -> list[ExperimentResult]

Parameters:

self: The Experiment instance
evaluators (list[Callable], optional): List of evaluator functions to run. If None, uses evaluators from the Experiment instance
results (list[ExperimentResult], optional): List of results to evaluate. If None, uses results from the Experiment instance

Returns: The evaluated results

`Span`

Represents a span in the tracing system. Usually created via the @span decorator.

Methods

`set_io()`

Set input and output data for the span.

def set_io(
    self,
    input_data: Optional[str] = None,
    output_data: Optional[str] = None
) -> None

Parameters:

self: The Span instance
input_data (str, optional): Input data to attach to the span. Will be converted to string if not already
output_data (str, optional): Output data to attach to the span. Will be converted to string if not already

`set_tags()`

Set tags on the span.

def set_tags(self, tags: dict[str, str]) -> None

Parameters:

self: The Span instance
tags (dict[str, str]): Dictionary of tags to set on the span

`set_attributes()`

Set attributes on the span.

def set_attributes(self, attributes: dict[str, Any]) -> None

Parameters:

self: The Span instance
attributes (dict[str, Any]): Dictionary of attributes to set on the span

`set_error()`

Set error information for the span.

def set_error(
    self,
    code: str,
    message: str,
    stack: Optional[str] = None
) -> None

Parameters:

self: The Span instance
code (str): Error code or exception class name
message (str): Error message
stack (str, optional): Stack trace information

Context Functions

`get_current_span()`

Returns the currently active span, if any.

def get_current_span() -> Optional[Span]

Returns: The currently active Span instance, or None if no span is active

`get_current_trace()`

Returns the current trace ID.

def get_current_trace() -> Optional[str]

Returns: The current trace ID, or None if no trace is active

`get_current_session()`

Returns the current session ID.

def get_current_session() -> Optional[str]

Returns: The current session ID, or None if no session is active

`set_tag()`

Sets tags on a span, trace, or session.

def set_tag(
    target: Union[Span, str],
    tags: dict[str, str]
) -> None

Parameters:

target: The target to set tags on
- Span: Sets tags on the specific span
- str: Sets tags on the trace (if valid trace ID) or session (if valid session ID)
tags (dict[str, str]): Dictionary of tags to set

Example:

import zeroeval as ze

# Set tags on current span
current_span = ze.get_current_span()
if current_span:
    ze.set_tag(current_span, {"user_id": "12345", "environment": "production"})

# Set tags on trace
trace_id = ze.get_current_trace()
if trace_id:
    ze.set_tag(trace_id, {"version": "1.5"})

`set_signal()`

Send a signal to a span, trace, or session.

def set_signal(
    target: Union[Span, str],
    signals: dict[str, Union[str, bool, int, float]]
) -> bool

Parameters:

target: The entity to attach signals to
- Span: Sends signals to the specific span
- str: Sends signals to the trace (if active trace ID) or session
signals (dict): Dictionary of signal names to values

Returns: True if signals were sent successfully, False otherwise Example:

import zeroeval as ze

# Send signals to current span
current_span = ze.get_current_span()
if current_span:
    ze.set_signal(current_span, {
        "accuracy": 0.95,
        "is_successful": True,
        "error_count": 0
    })

# Send signals to trace
trace_id = ze.get_current_trace()
if trace_id:
    ze.set_signal(trace_id, {"model_score": 0.85})

CLI Commands

The ZeroEval SDK includes a CLI tool for running experiments and setup.

`zeroeval run`

Run a Python script containing ZeroEval experiments.

zeroeval run script.py

`zeroeval setup`

Interactive setup to configure API credentials.

zeroeval setup

Environment Variables

The SDK uses the following environment variables:

ZEROEVAL_API_KEY: Your ZeroEval API key
ZEROEVAL_API_URL: API endpoint URL (defaults to https://api.zeroeval.com)
ZEROEVAL_DEBUG: Set to true to enable debug logging
ZEROEVAL_DISABLED_INTEGRATIONS: Comma-separated list of integrations to disable

Tracing

Autotune

Calibrated Judges

Experiments

LLM Gateway

​Installation

​Core Functions

​init()

​Decorators

​@span

​@experiment

​Classes

​Dataset

​Constructor

​Methods

push()

pull()

add_rows()

add_image()

add_audio()

add_media_url()

​Properties

​Example

​Experiment

​Constructor

​Methods

run()

run_task()

run_evaluators()

​Span

​Methods

set_io()

set_tags()

set_attributes()

set_error()

​Context Functions

​get_current_span()

​get_current_trace()

​get_current_session()

​set_tag()

​set_signal()

​CLI Commands

​zeroeval run

​zeroeval setup

​Environment Variables

Installation

Core Functions

`init()`

Decorators

`@span`

`@experiment`

Classes

`Dataset`

Constructor

Methods

`push()`

`pull()`

`add_rows()`

`add_image()`

`add_audio()`

`add_media_url()`

Properties

Example

`Experiment`

Constructor

Methods

`run()`

`run_task()`

`run_evaluators()`

`Span`

Methods

`set_io()`

`set_tags()`

`set_attributes()`

`set_error()`

Context Functions

`get_current_span()`

`get_current_trace()`

`get_current_session()`

`set_tag()`

`set_signal()`

CLI Commands

`zeroeval run`

`zeroeval setup`

Environment Variables