Set up a new autoresearch experiment with all required configuration.
/ar:setup # Interactive mode
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
/ar:setup --list # Show existing experiments
/ar:setup --list-evaluators # Show available evaluators
Pass them directly to the setup script:
python {skill_path}/scripts/setup_experiment.py \
--domain {domain} --name {name} \
--target {target} --eval "{eval_cmd}" \
--metric {metric} --direction {direction} \
[--evaluator {evaluator}] [--scope {scope}]
Collect each parameter one at a time:
Then run setup_experiment.py with the collected parameters.
# Show existing experiments
python {skill_path}/scripts/setup_experiment.py --list
# Show available evaluators
python {skill_path}/scripts/setup_experiment.py --list-evaluators
| Name | Metric | Use Case |
|---|---|---|
benchmark_speed |
p50_ms (lower) |
Function/API execution time |
benchmark_size |
size_bytes (lower) |
File, bundle, Docker image size |
test_pass_rate |
pass_rate (higher) |
Test suite pass percentage |
build_speed |
build_seconds (lower) |
Build/compile/Docker build time |
memory_usage |
peak_mb (lower) |
Peak memory during execution |
llm_judge_content |
ctr_score (higher) |
Headlines, titles, descriptions |
llm_judge_prompt |
quality_score (higher) |
System prompts, agent instructions |
llm_judge_copy |
engagement_score (higher) |
Social posts, ad copy, emails |
Report to the user:
/ar:run {domain}/{name} to start iterating, or /ar:loop {domain}/{name} for autonomous mode."