Setup Autoresearch Experiment

v20260612

setup

This skill initializes a new autoresearch experiment. It guides the user through defining all necessary parameters, including the domain, target files, evaluation commands, metrics, and optimization direction. It is the essential starting point for automated optimization and performance benchmarking cycles.

Autoresearch Setup Experiment Optimization Engineering Testing Benchmarking AI

Get Skill

441 downloads

Overview

/ar:setup — Create New Experiment

Set up a new autoresearch experiment with all required configuration.

Usage

/ar:setup                                    # Interactive mode
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
/ar:setup --list                             # Show existing experiments
/ar:setup --list-evaluators                  # Show available evaluators

What It Does

If arguments provided

Pass them directly to the setup script:

python {skill_path}/scripts/setup_experiment.py \
  --domain {domain} --name {name} \
  --target {target} --eval "{eval_cmd}" \
  --metric {metric} --direction {direction} \
  [--evaluator {evaluator}] [--scope {scope}]

If no arguments (interactive mode)

Collect each parameter one at a time:

Domain — Ask: "What domain? (engineering, marketing, content, prompts, custom)"
Name — Ask: "Experiment name? (e.g., api-speed, blog-titles)"
Target file — Ask: "Which file to optimize?" Verify it exists.
Eval command — Ask: "How to measure it? (e.g., pytest bench.py, python evaluate.py)"
Metric — Ask: "What metric does the eval output? (e.g., p50_ms, ctr_score)"
Direction — Ask: "Is lower or higher better?"
Evaluator (optional) — Show built-in evaluators. Ask: "Use a built-in evaluator, or your own?"
Scope — Ask: "Store in project (.autoresearch/) or user (~/.autoresearch/)?"

Then run setup_experiment.py with the collected parameters.

Listing

# Show existing experiments
python {skill_path}/scripts/setup_experiment.py --list

# Show available evaluators
python {skill_path}/scripts/setup_experiment.py --list-evaluators

Built-in Evaluators

Name	Metric	Use Case
`benchmark_speed`	`p50_ms` (lower)	Function/API execution time
`benchmark_size`	`size_bytes` (lower)	File, bundle, Docker image size
`test_pass_rate`	`pass_rate` (higher)	Test suite pass percentage
`build_speed`	`build_seconds` (lower)	Build/compile/Docker build time
`memory_usage`	`peak_mb` (lower)	Peak memory during execution
`llm_judge_content`	`ctr_score` (higher)	Headlines, titles, descriptions
`llm_judge_prompt`	`quality_score` (higher)	System prompts, agent instructions
`llm_judge_copy`	`engagement_score` (higher)	Social posts, ad copy, emails

After Setup

Report to the user:

Experiment path and branch name
Whether the eval command worked and the baseline metric
Suggest: "Run /ar:run {domain}/{name} to start iterating, or /ar:loop {domain}/{name} for autonomous mode."

Info

Category Development

Name setup

Version v20260612

Size 2.89KB

Source alirezarezvani/claude-skills

Updated At 2026-06-13