Skills Development Skill Quality Assurance and Testing

Skill Quality Assurance and Testing

v20260612
skill-tester
A comprehensive meta-skill for validating, testing, and scoring the quality of technical skills within an ecosystem. It performs deep structural checks (SKILL.md compliance), executes Python scripts for syntax, runtime, and output validation, and generates multi-dimensional quality scores using letter grades and tier recommendations. Ideal for new skill authorship, auditing existing components, or integrating rigorous QA gates into CI/CD pipelines.
Get Skill
482 downloads
Overview

Skill Tester

Tier: POWERFUL · Category: Engineering Quality Assurance · Dependencies: None (Python stdlib only)

Meta-skill that validates, tests, and scores skills in this repository. Four tools, run from the repo root with full paths:

  1. scripts/skill_validator.py — structure + documentation compliance
  2. scripts/script_tester.py — Python script syntax/imports/runtime/output testing
  3. scripts/quality_scorer.py — multi-dimensional scoring with letter grade
  4. scripts/security_scorer.py — security posture scoring (also available via quality_scorer.py --include-security)

Scope note: this skill's tier line-count minimums measure legacy skills. For authoring new skills, engineering/write-a-skill (SKILL.md under ~100 lines, Matt Pocock doctrine) is the binding standard — do not pad a new skill to satisfy a tier minimum here.

Quick Start (exact, runnable from repo root)

# 1. Validate structure (exit non-zero on failure — usable as a gate)
python3 engineering/skills/skill-tester/scripts/skill_validator.py engineering/skills/self-eval --json

# 2. Test the skill's Python scripts (30s default timeout per script)
python3 engineering/skills/skill-tester/scripts/script_tester.py engineering/skills/self-eval --json

# 3. Score quality (fail CI below threshold with --minimum-score)
python3 engineering/skills/skill-tester/scripts/quality_scorer.py engineering/skills/self-eval --json --detailed --minimum-score 75

Consume the JSON: validator emits overall_score, compliance_level, per-check checks{}; scorer emits overall_score, letter_grade, tier_recommendation, dimensions, and an improvement_roadmap — work the roadmap top-down, then re-run until the target score is met.

For repo-wide auditing prefer scripts/audit_skills.py at the repo root (wraps the write-a-skill checklist runner across all skills).

What Each Tool Checks

skill_validator.py

  • SKILL.md frontmatter parsing, required sections, minimum line counts per tier (--tier BASIC|STANDARD|POWERFUL)
  • Required structure: SKILL.md, README.md, scripts/, references/, assets/, expected_outputs/
  • Python scripts: argparse present, stdlib-only imports

script_tester.py

  • AST-based syntax validation; import analysis (flags external dependencies)
  • Controlled execution with timeout protection (--timeout, default 30s)
  • --help functionality verification; sample-data runs compared against expected_outputs/

quality_scorer.py

Four dimensions, 25% each: Documentation (depth, examples, references), Code Quality (complexity, error handling, output consistency), Completeness (required dirs, sample data, expected outputs), Usability (help text, example clarity). Outputs 0-100 + A-F grade + tier recommendation.

Tier Classification

Tier SKILL.md Scripts CLI surface
BASIC ≥ 100 lines 1 (100-300 LOC) basic argparse
STANDARD ≥ 200 lines 1-2 (300-500 LOC) subcommands, JSON + text output
POWERFUL ≥ 300 lines 2-3 (500-800 LOC) multiple modes, CI integration

(Advisory for legacy skills; new skills follow write-a-skill — see scope note above.)

CI Integration

# GitHub Actions: gate changed skills
- name: "validate-changed-skills"
  run: |
    for skill in $changed_skills; do
      python3 engineering/skills/skill-tester/scripts/skill_validator.py "$skill" --json
      python3 engineering/skills/skill-tester/scripts/script_tester.py "$skill"
      python3 engineering/skills/skill-tester/scripts/quality_scorer.py "$skill" --minimum-score 75
    done

Pre-commit hook: run the validator on the staged skill directory and block the commit on non-zero exit.

Verification Loop

A skill "passes" when, in one run from repo root:

  1. skill_validator.py <skill> --json exits 0,
  2. script_tester.py <skill> reports all scripts passing, and
  3. quality_scorer.py <skill> --minimum-score <target> exits 0.

If any step fails, apply the top improvement_roadmap item and re-run all three — never report a partial pass.

Troubleshooting

  • Timeout errors → raise --timeout or optimize the script under test
  • Import failures → external deps detected; stdlib-only is the repo policy
  • Tier misclassification → check line counts/LOC against the tier table; remember the write-a-skill exception for new skills

References: references/ holds the structure specification, tier requirements matrix, and scoring rubric the tools implement.

Info
Category Development
Name skill-tester
Version v20260612
Size 64.87KB
Updated At 2026-06-13
Language