Plan and implement features with precision. Granular tasks. Clear dependencies. Right tools. Zero ceremony.
┌──────────┐ ┌──────────┐ ┌─────────┐ ┌─────────┐
│ SPECIFY │ → │ DESIGN │ → │ TASKS │ → │ EXECUTE │
└──────────┘ └──────────┘ └─────────┘ └─────────┘
required optional* optional* required
* Agent auto-skips when scope doesn't need it
Loading this skill's files. Reference files live under references/ in this skill's own directory (where this SKILL.md resides). Resolve them relative to the skill directory — never the workspace root — and load them through the active skill by name; never assume a fixed install path. When a step tells you to read a reference, read it completely (to EOF) before acting — never act on a partial/truncated read.
Execution contract — every task, non-negotiable (holds even if you do not open the reference files):
Before Execute: read implement.md completely; if a formal tasks.md has more than 3 phases, present the sub-agent offer first (see Sub-Agent Delegation).
The complexity determines the depth, not a fixed pipeline. Before starting any feature, assess its scope and apply only what's needed:
| Scope | What | Specify | Design | Tasks | Execute |
|---|---|---|---|---|---|
| Small | ≤3 files, one sentence | One-liner spec (inline) | Skip | Skip | Implement + verify inline |
| Medium | Clear feature, <10 tasks | Spec (brief) | Skip — design inline | Skip — tasks implicit | Implement + verify |
| Large | Multi-component feature | Full spec + requirement IDs | Architecture + components | Full breakdown + dependencies | Implement + verify per task |
| Complex | Ambiguity, new domain | Full spec + discuss gray areas | Research + architecture | Breakdown + parallel plan | Implement + interactive UAT |
Rules:
Safety valve: Even when Tasks is skipped, Execute ALWAYS starts by listing atomic steps inline (see implement.md). If that listing reveals >5 steps or complex dependencies, STOP and create a formal tasks.md — the Tasks phase was wrongly skipped.
.specs/
├── STATE.md # Project memory: Decisions log (AD-NNN) + Handoff snapshot
├── LESSONS.md # Self-improving lessons playbook (rendered by scripts/lessons.py — do not hand-edit)
├── lessons.json # Canonical lessons state (machine-owned)
└── features/ # Feature specifications
└── [feature]/
├── spec.md # Requirements with traceable IDs
├── context.md # User decisions for gray areas (only when discuss is triggered)
├── design.md # Architecture & components (only for Large/Complex)
├── tasks.md # Atomic tasks with verification (only for Large/Complex)
└── validation.md # Verifier report: PASS/FAIL, per-AC evidence, sensor result, diff range
New feature:
Resume work:
Read .specs/STATE.md — Handoff section for in-flight state, Decisions section to re-confirm active constraints — then propose the next step.
On-demand load (only what the current task needs):
.specs/STATE.md — Decisions section (read at Design, re-read on resume); Handoff section (read on resume only)python3 scripts/lessons.py list --status confirmed (lessons.md); confirmed only, never candidatesNever load simultaneously:
Target: <40k tokens total context Reserve: 160k+ tokens for work, reasoning, outputs Monitoring: Display status when >40k (see context-limits.md)
Trigger: >3 phases → offer one worker per phase (sequential); ≤3 phases → execute inline.
Offer-then-confirm — never auto-spawn. The user must accept before any sub-agent is dispatched.
One worker per phase: Each phase worker executes all its tasks in order (implement → gate → atomic commit), then reports a compact summary (tasks done, commit hashes, test counts, deviations). Workers never spawn further sub-agents.
Verifier (always-on, never prompted): After the final task is committed, the orchestrator dispatches a fresh Verifier sub-agent automatically — regardless of phase count. Validation never requires a user prompt; it is the closing step of Execute. Author ≠ verifier: the Verifier re-derives coverage independently using evidence-or-zero; it does not inherit the author's mental model. The Verifier: (1) performs a spec-anchored outcome check — confirms each test's asserted value matches the spec-defined expected outcome, flags spec-precision gaps; (2) runs a discrimination sensor — injects behavior-level faults in scratch state, confirms tests kill them, discards mutations, surviving mutants become fix tasks; (3) writes .specs/features/[feature]/validation.md (PASS/FAIL, per-AC evidence, sensor result, diff range); (4) returns a compact verdict + ranked gap list to the orchestrator in chat. Gaps become fix tasks; the fix→re-verify loop is bounded to 3 iterations before escalating. (5) distills lessons — turns each grounded failure (surviving mutant, spec-precision gap, failed AC, SPEC_DEVIATION) into a reusable project-local lesson via scripts/lessons.py; a clean PASS records nothing (see lessons.md).
Standalone fallback: Without sub-agents, run validate.md as an independent fresh-eyes pass after the final commit — including the spec-anchored check and discrimination sensor.
Full mechanics (worker payload, compact summary format, failure handling, context sizing, Verifier report format): sub-agents.md.
Feature-level (auto-sized):
| Trigger Pattern | Reference |
|---|---|
| Specify feature, define requirements | specify.md |
| Discuss feature, capture context, how should this work | discuss.md |
| Design feature, architecture | design.md |
| Break into tasks, create tasks | tasks.md |
| Implement task, build, execute | implement.md |
| Validate, verify, test, UAT, walk me through it | validate.md |
Memory:
| Trigger Pattern | Reference |
|---|---|
| Record decision, this is a project-level decision | memory.md |
| Pause work, end session, I need to stop | memory.md |
| Resume work, continue, pick up where we left off | memory.md |
| Load lessons, what have we learned, apply past lessons | lessons.md |
| Record lesson, distill lessons (auto-runs after validation) | lessons.md |
When researching, designing, or making any technical decision, follow this chain in strict order. Never skip steps.
Step 1: Codebase → check existing code, conventions, and patterns already in use
Step 2: Project docs → README, docs/, inline comments, `.specs/STATE.md` (Decisions)
Step 3: Context7 MCP → resolve library ID, then query for current API/patterns
Step 4: Web search → official docs, reputable sources, community patterns
Step 5: Flag as uncertain → "I'm not certain about X — here's my reasoning, but verify"
Rules:
Model guidance: After completing lightweight tasks (validation, feature-level checks), naturally mention once per session that such tasks work well with faster/cheaper models. For heavy tasks (complex design, large features), briefly note the reasoning requirements before starting.
Be conversational, not robotic. Don't interrupt workflow—add as a natural closing note. Skip if user seems experienced or has already acknowledged the tip.
Use available tools with graceful degradation. See code-analysis.md.