Why datasets?
Datasets are named, versioned collections of rows. Each row is just a Pythondict
. Use them to store test cases for your model and share them across experiments.
Quick Start
Creating Datasets
From Data
From CSV Files
Load datasets directly from CSV files:Row Access & Manipulation
Accessing Rows
Slicing & Subsetting
Adding & Modifying Rows
Multimodal Data
Add images, audio, video, and URLs to any cell:- Images:
.jpg
,.jpeg
,.png
,.gif
,.webp
- Audio:
.mp3
,.wav
,.ogg
,.m4a
- Video:
.mp4
,.webm
,.mov
- URLs: Any external media link
Dataset Properties
Versioning & Persistence
Push & Pull
Version Properties
Running Experiments
Datasets can run tasks directly (see Experiments for details):Method Chaining
Tips
• Start small: Test withdataset[:10]
before running on full datasets
• Use CSV loading: Fastest way to get started with existing data• Dot notation: Makes row access more readable than
row["field"]
• Version everything: Each push creates immutable versions for reproducibility
• Multimodal: Add media after creating the basic dataset structure
• Error handling: Wrap file operations and data validation in try/catch blocks