技能 编程开发 智能体合规性测量

智能体合规性测量

v20260410
skill-comply
该工具用于自动测量AI编程智能体是否严格遵循预设的技能、规则或工作流定义。它能自动生成预期行为规范,并运行不同严格度的场景测试。通过分析工具调用时序和行为,生成详细的合规报告,确保代码生成过程的质量和可靠性。
获取技能
236 次下载
概览

skill-comply: Automated Compliance Measurement

Measures whether coding agents actually follow skills, rules, or agent definitions by:

  1. Auto-generating expected behavioral sequences (specs) from any .md file
  2. Auto-generating scenarios with decreasing prompt strictness (supportive → neutral → competing)
  3. Running claude -p and capturing tool call traces via stream-json
  4. Classifying tool calls against spec steps using LLM (not regex)
  5. Checking temporal ordering deterministically
  6. Generating self-contained reports with spec, prompts, and timelines

Supported Targets

  • Skills (skills/*/SKILL.md): Workflow skills like search-first, TDD guides
  • Rules (rules/common/*.md): Mandatory rules like testing.md, security.md, git-workflow.md
  • Agent definitions (agents/*.md): Whether an agent gets invoked when expected (internal workflow verification not yet supported)

When to Activate

  • User runs /skill-comply <path>
  • User asks "is this rule actually being followed?"
  • After adding new rules/skills, to verify agent compliance
  • Periodically as part of quality maintenance

Usage

# Full run
uv run python -m scripts.run ~/.claude/rules/common/testing.md

# Dry run (no cost, spec + scenarios only)
uv run python -m scripts.run --dry-run ~/.claude/skills/search-first/SKILL.md

# Custom models
uv run python -m scripts.run --gen-model haiku --model sonnet <path>

Key Concept: Prompt Independence

Measures whether a skill/rule is followed even when the prompt doesn't explicitly support it.

Report Contents

Reports are self-contained and include:

  1. Expected behavioral sequence (auto-generated spec)
  2. Scenario prompts (what was asked at each strictness level)
  3. Compliance scores per scenario
  4. Tool call timelines with LLM classification labels

Advanced (optional)

For users familiar with hooks, reports also include hook promotion recommendations for steps with low compliance. This is informational — the main value is the compliance visibility itself.

信息
Category 编程开发
Name skill-comply
版本 v20260410
大小 19.68KB
更新时间 2026-04-12
语言