Datasets

Why datasets?

Datasets are named, versioned collections of rows. Each row is just a Python dict. Use them to store test cases for your model and share them across experiments.

Quick Start

import zeroeval as ze

ze.init()  # pick up API key from `zeroeval setup`

# Create from data
capitals = ze.Dataset(
    "Capitals",  # name as first argument
    data=[
        {"input": "Colombia", "output": "Bogotá"},
        {"input": "Peru", "output": "Lima"},
    ],
    description="Country → capital mapping"
)

capitals.push()          # 🚀 creates version 1 in your workspace
capitals = ze.Dataset.pull("Capitals")  # later, fetch it back

# Access rows with dot notation or dictionary syntax
print(capitals[0])         # DotDict: supports both access methods
print(capitals[0].input)   # "Colombia" (dot notation)
print(capitals[0]["input"]) # "Colombia" (dict syntax)

Creating Datasets

From Data

# Simple creation
dataset = ze.Dataset("my_dataset", data=[
    {"question": "What is 2+2?", "answer": "4"},
    {"question": "What is 3+3?", "answer": "6"}
])

# With description
dataset = ze.Dataset(
    "math_questions",
    data=data_list,
    description="Basic arithmetic questions"
)

From CSV Files

Load datasets directly from CSV files:

# Load from CSV - name will be the filename
dataset = ze.Dataset("/path/to/my_data.csv")

# Load with custom description
dataset = ze.Dataset(
    "/path/to/survey_data.csv",
    description="Customer satisfaction survey results"
)

print(f"Loaded {len(dataset)} rows from CSV")
print(f"Columns: {dataset.columns}")

Row Access & Manipulation

Accessing Rows

# Single row access (returns DotDict)
first_row = dataset[0]
last_row = dataset[-1]

# Dot notation access
question = first_row.question
answer = first_row.answer

# Dictionary access
question = first_row["question"]
answer = first_row["answer"]

# Iteration
for row in dataset:
    print(f"Q: {row.question}, A: {row.answer}")

Slicing & Subsetting

len(dataset)           # number of rows
dataset.columns        # ['question', 'answer']

# Standard list slicing
first_5 = dataset[:5]         # New Dataset with first 5 rows
last_10 = dataset[-10:]      # New Dataset with last 10 rows
middle = dataset[10:20]      # Rows 10-19
every_other = dataset[::2]   # Every other row

# Sliced datasets preserve metadata
print(first_5.name)    # "math_questions_slice"

Adding & Modifying Rows

# Add single or multiple rows
dataset.add_rows([
    {"question": "What is 5+5?", "answer": "10"},
    {"question": "What is 7+3?", "answer": "10"}
])

# Update existing row
dataset.update_row(0, {"question": "What is 1+1?", "answer": "2"})

# Or use indexing
dataset[0] = {"question": "What is 1+1?", "answer": "2"}

# Delete rows
dataset.delete_row(2)  # Delete row at index 2
del dataset[1]         # Alternative syntax

Multimodal Data

Add images, audio, video, and URLs to any cell:

medical = ze.Dataset("medical_cases", [
    {"patient_id": "P001", "symptoms": "chest pain"}
])

# Add different media types
medical.add_image(0, "xray", "scans/patient001_chest.jpg")
medical.add_audio(0, "heartbeat", "audio/patient001_heart.wav")
medical.add_video(0, "exam_footage", "videos/patient001_exam.mp4")
medical.add_media_url(0, "external_report",
                      "https://example.com/report.pdf",
                      media_type="image")

medical.push()

# Access media in your tasks
@ze.task(outputs=["diagnosis"])
def diagnose(row):
    # row.xray will contain the base64-encoded image
    # row.heartbeat will contain the base64-encoded audio
    return {"diagnosis": analyze_media(row.xray, row.heartbeat)}

Supported formats:

Images: .jpg, .jpeg, .png, .gif, .webp
Audio: .mp3, .wav, .ogg, .m4a
Video: .mp4, .webm, .mov
URLs: Any external media link

Dataset Properties

# Basic info
print(dataset.name)         # "medical_cases"
print(dataset.description)  # "Medical diagnostic cases"
print(len(dataset))         # 150

# Column information
print(dataset.columns)      # ['patient_id', 'symptoms', 'xray', ...]

# Version info (after pushing)
print(dataset.version_number) # 1

# String representations
print(dataset)              # Dataset('medical_cases', 150 records)

Versioning & Persistence

Push & Pull

# Push creates new versions automatically
dataset.push()                    # Version 1
dataset.add_rows([new_data])
dataset.push()                    # Version 2 (automatic)

# Pull latest version
latest = ze.Dataset.pull("medical_cases")

# Pull specific version
v1 = ze.Dataset.pull("medical_cases", version_number=1)
v2 = ze.Dataset.pull("medical_cases", version_number=2)

print(f"V1 has {len(v1)} rows")
print(f"V2 has {len(v2)} rows")

Version Properties

# After pulling a dataset
dataset = ze.Dataset.pull("my_dataset")

print(dataset.version_number)   # Version number (1, 2, 3, etc.)
print(dataset.name)             # Dataset name

Running Experiments

Datasets can run tasks directly (see Experiments for details):

@ze.task(outputs=["prediction"])
def classify(row):
    return {"prediction": model.predict(row.text)}

# Run task on dataset
run = dataset.run(classify)
run.eval([accuracy_evaluator])

# Multiple runs for stability testing
all_runs = run.repeat(5)

Method Chaining

# Many operations support chaining
result = (ze.Dataset("test", data=initial_data)
          .add_rows(more_data)
          .push()  # Returns self
          .run(my_task)
          .eval([my_evaluator]))

Tips

• Start small: Test with dataset[:10] before running on full datasets • Use CSV loading: Fastest way to get started with existing data
• Dot notation: Makes row access more readable than row["field"] • Version everything: Each push creates immutable versions for reproducibility • Multimodal: Add media after creating the basic dataset structure • Error handling: Wrap file operations and data validation in try/catch blocks

Tracing

Autotune

Calibrated Judges

Experiments

LLM Gateway

Why datasets?

Quick Start

Creating Datasets

From Data

From CSV Files

Row Access & Manipulation

Accessing Rows

Slicing & Subsetting

Adding & Modifying Rows

Multimodal Data

Dataset Properties

Versioning & Persistence

Push & Pull

Version Properties

Running Experiments

Method Chaining

Tips

Tracing

Autotune

Calibrated Judges

Experiments

LLM Gateway

​Why datasets?

​Quick Start

​Creating Datasets

​From Data

​From CSV Files

​Row Access & Manipulation

​Accessing Rows

​Slicing & Subsetting

​Adding & Modifying Rows

​Multimodal Data

​Dataset Properties

​Versioning & Persistence

​Push & Pull

​Version Properties

​Running Experiments

​Method Chaining

​Tips

Why datasets?

Quick Start

Creating Datasets

From Data

From CSV Files

Row Access & Manipulation

Accessing Rows

Slicing & Subsetting

Adding & Modifying Rows

Multimodal Data

Dataset Properties

Versioning & Persistence

Push & Pull

Version Properties

Running Experiments

Method Chaining

Tips