Fix argument interpolation and add behavioral evals #4

HartBrook · 2026-01-29T02:24:02Z

Summary

Fix: Command prompts referenced arguments as vague prose ("if auto-fix is enabled") instead of $ARGUMENTS.<name> at decision points. The executing agent could miss interpolated values and fall back to safe defaults (e.g., auto-fix=false). All prose references replaced with explicit $ARGUMENTS.* tokens in the instruction body.
Evals: Added promptfoo behavioral evals (make eval) that send interpolated prompts to Claude and assert the model correctly interprets argument values (auto-fix true/false, passes count, model resolution, scope resolution, tidy flags).
Static test: New test_argument_interpolation in test.sh enforces every frontmatter argument appears as $ARGUMENTS.<name> in the body — prevents future regressions.
Docs: Updated CONTRIBUTING.md with test instructions, API key setup, and prompt authoring guidelines. Added Development section to README.
Version: Bumped to 0.2.1.

Test plan

make test — 51 structural tests pass (including new interpolation check)
make eval — 8 behavioral evals pass with ANTHROPIC_API_KEY set
Regression: removing a $ARGUMENTS.* reference from a command body causes make test to fail

Command prompts referenced arguments as vague prose ("if auto-fix is enabled") instead of using $ARGUMENTS.<name> at decision points. The executing agent could miss interpolated values and fall back to safe defaults like auto-fix=false. Replace all prose references with explicit $ARGUMENTS.* tokens in the instruction body. Add a static test enforcing every frontmatter argument appears as $ARGUMENTS.<name> in the body, and promptfoo behavioral evals that verify models correctly interpret argument values.

HartBrook merged commit cf0a314 into main Jan 29, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix argument interpolation and add behavioral evals #4

Fix argument interpolation and add behavioral evals #4

Uh oh!

HartBrook commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix argument interpolation and add behavioral evals #4

Fix argument interpolation and add behavioral evals #4

Uh oh!

Conversation

HartBrook commented Jan 29, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants