Running Evaluations
Run an Evaluation
Section titled “Run an Evaluation”agentv eval evals/my-eval.yamlResults are written to .agentv/results/eval_<timestamp>.jsonl.
Common Options
Section titled “Common Options”Override Target
Section titled “Override Target”Run against a different target than specified in the eval file:
agentv eval --target azure_base evals/**/*.yamlRun Specific Eval Case
Section titled “Run Specific Eval Case”Run a single eval case by ID:
agentv eval --eval-id case-123 evals/my-eval.yamlDry Run
Section titled “Dry Run”Test the harness flow with mock responses (does not call real providers):
agentv eval --dry-run evals/my-eval.yamlOutput to Specific File
Section titled “Output to Specific File”agentv eval evals/my-eval.yaml --out results/baseline.jsonlWorkspace Cleanup
Section titled “Workspace Cleanup”When using workspace_template, temporary workspaces are created for each eval case. By default, workspaces are cleaned up on success and preserved on failure for debugging.
# Always keep workspaces (for debugging)agentv eval evals/my-eval.yaml --keep-workspaces
# Always cleanup workspaces (even on failure)agentv eval evals/my-eval.yaml --cleanup-workspacesWorkspaces are stored at ~/.agentv/workspaces/<eval-run-id>/<case-id>/.
Validate Before Running
Section titled “Validate Before Running”Check eval files for schema errors without executing:
agentv validate evals/my-eval.yamlAll Options
Section titled “All Options”Run agentv eval --help for the full list of options including workers, timeouts, output formats, and trace dumping.