Introduction - ZeroEval Documentation

Your agent makes dozens of decisions every run — retrieving context, calling models, executing tools, generating responses. Without observability, failures are invisible, regressions go unnoticed, and optimization is guesswork. ZeroEval Tracing captures the full execution graph of your AI system so you can:

Debug failed runs by inspecting the exact inputs, outputs, and errors at every step
Evaluate output quality at scale with calibrated judges that score your traces automatically
Optimize prompts and models by comparing versions against real production data with prompt optimization
Monitor cost, latency, and error rates across sessions, traces, and spans

How it works

Instrument your code

Add a few lines to your application. The SDK automatically captures LLM calls, or you can create custom spans for any operation.

Traces flow into ZeroEval

Every agent run becomes a trace — a tree of spans showing what happened, in what order, with full inputs and outputs.

Organize with sessions and tags

Group related traces into sessions and tag them with metadata for filtering. Attach human feedback or let judges evaluate outputs automatically.

Unlock the feedback loop

Use your traced data to run judges, optimize prompts, and build evaluations — all from the same production data.

Get started

Create an API key from Settings → API Keys, then pick your integration path:

Python SDK

Decorators and context managers for Python apps. Auto-instruments OpenAI, LangChain, Claude Agent SDK, Gemini, and more.

TypeScript SDK

Wrapper functions for Node.js and Bun. Auto-instruments OpenAI, Vercel AI SDK, and Claude Agent SDK.

REST API

Send spans, traces, and sessions directly over HTTP from any language.

OpenTelemetry

Route OTLP traces from any OpenTelemetry-instrumented app to ZeroEval.

Using Cursor, Claude Code, or another coding agent? The zeroeval-install skill can handle SDK setup, first trace, and prompt migration for you.

Need source-side PII redaction for ingested span payloads? ZeroEval supports opt-in redaction of input_data and output_data in both SDKs. See PII redaction.

Setup

⌘I

​How it works

​Get started

Python SDK

TypeScript SDK

REST API

OpenTelemetry

How it works

Get started