ZeroEval lets you evaluate real production traces of specific agent tasks across different models, then ranking them over time. This helps you pick the best model for each part of your agent.