Running Evaluations

Run an Evaluation

agentv eval evals/my-eval.yaml

Results are written to .agentv/results/eval_<timestamp>.jsonl.

Common Options

Override Target

Run against a different target than specified in the eval file:

agentv eval --target azure_base evals/**/*.yaml

Run Specific Eval Case

Run a single eval case by ID:

agentv eval --eval-id case-123 evals/my-eval.yaml

Dry Run

Test the harness flow with mock responses (does not call real providers):

agentv eval --dry-run evals/my-eval.yaml

Output to Specific File

agentv eval evals/my-eval.yaml --out results/baseline.jsonl

Workspace Cleanup

When using workspace_template, temporary workspaces are created for each eval case. By default, workspaces are cleaned up on success and preserved on failure for debugging.

# Always keep workspaces (for debugging)
agentv eval evals/my-eval.yaml --keep-workspaces

# Always cleanup workspaces (even on failure)
agentv eval evals/my-eval.yaml --cleanup-workspaces

Workspaces are stored at ~/.agentv/workspaces/<eval-run-id>/<case-id>/.

Validate Before Running

Check eval files for schema errors without executing:

agentv validate evals/my-eval.yaml

All Options

Run agentv eval --help for the full list of options including workers, timeouts, output formats, and trace dumping.