Your agent makes dozens of decisions every run — retrieving context, calling models, executing tools, generating responses. Without observability, failures are invisible, regressions go unnoticed, and optimization is guesswork. ZeroEval Tracing captures the full execution graph of your AI system so you can:Documentation Index
Fetch the complete documentation index at: https://docs.zeroeval.com/llms.txt
Use this file to discover all available pages before exploring further.
- Debug failed runs by inspecting the exact inputs, outputs, and errors at every step
- Evaluate output quality at scale with calibrated judges that score your traces automatically
- Optimize prompts and models by comparing versions against real production data with prompt optimization
- Monitor cost, latency, and error rates across sessions, traces, and spans
How it works
Instrument your code
Add a few lines to your application. The SDK automatically captures LLM
calls, or you can create custom spans for any operation.
Traces flow into ZeroEval
Every agent run becomes a trace — a tree of spans showing what happened, in
what order, with full inputs and outputs.
Organize with sessions and tags
Group related traces into sessions and tag them with metadata for filtering.
Attach human feedback or let judges evaluate outputs automatically.
Get started
Create an API key from Settings → API Keys, then pick your integration path:Python SDK
Decorators and context managers for Python apps. Auto-instruments OpenAI,
LangChain, Claude Agent SDK, Gemini, and more.
TypeScript SDK
Wrapper functions for Node.js and Bun. Auto-instruments OpenAI, Vercel AI
SDK, and Claude Agent SDK.
REST API
Send spans, traces, and sessions directly over HTTP from any language.
OpenTelemetry
Route OTLP traces from any OpenTelemetry-instrumented app to ZeroEval.
Need source-side PII redaction for ingested span payloads? ZeroEval supports
opt-in redaction of
input_data and output_data in both SDKs. See PII
redaction.