agent-eval
affaan-m/everything-claude-code
Agent Eval Benchmarking orchestrates repeatable CLI comparisons of coding agents like Claude Code, Aider, and Codex, collecting pass rate, cost, time, and consistency metrics on your repository tasks so you can pick the right assistant with data.