How optimization works
Every optimization follows the same lifecycle:Collect feedback
Attach thumbs-up/down ratings, reasons, and expected outputs to real completions.
This is the raw signal optimization learns from.
Start an optimization run
Trigger an optimization from the dashboard. ZeroEval selects a strategy based on speed
and depth, then generates a candidate prompt from your feedback.
Compare against the baseline
The candidate is scored against your current prompt using the same feedback signal,
so you can see whether it actually improves behavior.
Validate with simulations
Run the candidate against test cases and multiple models to confirm improvements
generalize beyond the examples used during optimization.
Before you optimize
Optimization quality depends directly on the quality and quantity of feedback attached to your completions. Before starting a run, make sure:- Your prompt is tracked with
ze.prompt()so completions are linked to specific prompt versions. - Completions are flowing through ZeroEval with enough volume to represent real usage patterns.
- Feedback is attached to those completions — both positive and negative examples help.
For details on how to submit feedback through the dashboard, SDK, or API, see Human Feedback.
Start an optimization run
Navigate to your prompt’s Suggestions tab in the ZeroEval dashboard and click Optimize Prompt. ZeroEval will:- Gather feedback examples linked to your prompt.
- Select an optimization strategy based on the complexity of your prompt and the available signal.
- Generate one or more candidate prompts.
Review the candidate prompt
Optimization produces a candidate — a proposed new version of your prompt. It does not overwrite your current prompt automatically. You can review the candidate side-by-side with your baseline (the current active version) to understand exactly what changed and why. The candidate is derived from patterns in your feedback: corrections steer the wording, positive examples reinforce what already works.Compare against your baseline
ZeroEval measures whether the candidate actually outperforms the baseline using the feedback-derived signal. The comparison shows:- How the candidate scores relative to the current prompt on the same set of examples.
- Whether improvements on some examples come at the cost of regressions on others.
- An overall recommendation based on the comparison results.
Validate with simulations
After optimization, you can run the candidate against test cases using multiple models to confirm the improvement holds up beyond the training examples. Simulations help answer:- Does the candidate work well across different models, not just the one it was optimized for?
- Does it handle edge cases that were not part of the original feedback set?
- Are there any regressions on specific scenarios?
Optimization strategies
ZeroEval offers three optimization strategies, each suited to a different speed and depth tradeoff:| Strategy | Speed | Best for |
|---|---|---|
| Quick Refine | Seconds | Fast iteration on straightforward prompt fixes. Rewrites the prompt in a single pass using your feedback examples. |
| Bootstrap | Minutes | Prompts where good examples clearly demonstrate the desired behavior. Selects high-quality demonstrations and chains them with the prompt. |
| GEPA | 10-60 minutes | Complex prompts or multi-intent optimization. Runs a deeper search that evolves candidates across multiple generations, guided by reflection on performance. |
Multi-intent optimization and guardrails
Some prompts serve multiple goals. For example, a customer support prompt might need to be both accurate and concise — improving one should not come at the expense of the other.Intent weighting
When your prompt has multiple linked judges (intents), you can assign weights to each intent to tell the optimizer which goals matter most. Intents with higher weight have more influence on which candidate is selected.Guardrails
Guardrails are quality floors that prevent adopting a candidate that regresses on important intents. You can set minimum thresholds per intent, and any candidate that falls below those thresholds is automatically rejected — even if it improves overall performance. This ensures that optimization never ships a prompt that fixes one problem by creating another.Use optimized prompts in production
When you useze.prompt() with a content fallback, ZeroEval automatically resolves to the latest published version from your dashboard. Once you publish an optimized candidate, your app starts using it immediately with no code changes:
content string serves as the fallback for initial setup or if the ZeroEval service is unreachable. Once an optimized version is published, ze.prompt() returns that version instead.
To bypass optimization and force the hardcoded content (useful for debugging or A/B testing), use explicit mode:
Best practices
- Wait for enough signal. Optimizing with only a handful of feedback examples produces unreliable results. Aim for a representative sample of both positive and negative completions before running.
- Include corrections, not just thumbs. Reasons and expected outputs give the optimizer concrete material to work with. A thumbs-down alone tells the system something is wrong but not what the right answer looks like.
- Validate before rollout. Use simulations to confirm the candidate works across models and edge cases before publishing it to production.
- Iterate. Optimization is not a one-time step. As your product evolves and usage patterns shift, new feedback will surface new improvement opportunities. Run optimization periodically as fresh signal accumulates.
- Use guardrails for multi-intent prompts. If your prompt serves multiple goals, set guardrails to prevent regressions on critical intents.