Coding Agents

Coding agent targets evaluate AI coding assistants and CLI-based agents. These targets require a judge_target to run LLM-based evaluators.

Claude Code

targets:
  - name: claude_code
    provider: claude-code
    workspace_template: ./workspace-templates/my-project
    judge_target: azure_base

Field	Required	Description
`workspace_template`	No	Path to workspace template directory
`cwd`	No	Working directory (mutually exclusive with workspace_template)
`judge_target`	Yes	LLM target for evaluation

Codex CLI

targets:
  - name: codex_target
    provider: codex
    workspace_template: ./workspace-templates/my-project
    judge_target: azure_base

Field	Required	Description
`workspace_template`	No	Path to workspace template directory
`cwd`	No	Working directory (mutually exclusive with workspace_template)
`judge_target`	Yes	LLM target for evaluation

Pi Coding Agent

targets:
  - name: pi_target
    provider: pi-coding-agent
    workspace_template: ./workspace-templates/my-project
    judge_target: azure_base

Field	Required	Description
`workspace_template`	No	Path to workspace template directory
`cwd`	No	Working directory (mutually exclusive with workspace_template)
`judge_target`	Yes	LLM target for evaluation

VS Code / Copilot

targets:
  - name: vscode_dev
    provider: vscode
    workspace_template: ${{ WORKSPACE_PATH }}
    judge_target: azure_base

Field	Required	Description
`workspace_template`	Yes	Path to workspace template directory
`judge_target`	Yes	LLM target for evaluation

VS Code Insiders

targets:
  - name: vscode_insiders
    provider: vscode-insiders
    workspace_template: ${{ WORKSPACE_PATH }}
    judge_target: azure_base

Same configuration as VS Code.

Custom CLI Agent

Evaluate any command-line agent:

targets:
  - name: local_agent
    provider: cli
    command_template: 'python agent.py --prompt {PROMPT}'
    workspace_template: ./workspace-templates/my-project
    judge_target: azure_base

Field	Required	Description
`command_template`	Yes	Command to run. `{PROMPT}` is replaced with the input.
`workspace_template`	No	Path to workspace template directory
`cwd`	No	Working directory (mutually exclusive with workspace_template)
`judge_target`	Yes	LLM target for evaluation

Mock Provider

For testing the evaluation harness without calling real providers:

targets:
  - name: mock_target
    provider: mock