Spec-driven workflow enforces a single, non-negotiable rule: write the specification BEFORE you write any code. Not alongside. Not after. Before.
This is not documentation. This is a contract. A spec defines what the system MUST do, what it SHOULD do, and what it explicitly WILL NOT do. Every line of code you write traces back to a requirement in the spec. Every test traces back to an acceptance criterion. If it is not in the spec, it does not get built.
NO CODE WITHOUT AN APPROVED SPEC.
NO EXCEPTIONS. NO "QUICK PROTOTYPES." NO "I'LL DOCUMENT IT LATER."
If the spec is not written, reviewed, and approved, implementation does not begin. Period.
Every spec follows this structure. No sections are optional — if a section does not apply, write "N/A — [reason]" so reviewers know it was considered, not forgotten.
| # | Section | Key Rules |
|---|---|---|
| 1 | Title and Metadata | Author, date, status (Draft/In Review/Approved/Superseded), reviewers |
| 2 | Context | Why this feature exists. 2-4 paragraphs with evidence (metrics, tickets). |
| 3 | Functional Requirements | RFC 2119 keywords (MUST/SHOULD/MAY). Numbered FR-N. Each is atomic and testable. |
| 4 | Non-Functional Requirements | Performance, security, accessibility, scalability, reliability — all with measurable thresholds. |
| 5 | Acceptance Criteria | Given/When/Then format. Every AC references at least one FR-* or NFR-*. |
| 6 | Edge Cases | Numbered EC-N. Cover failure modes for every external dependency. |
| 7 | API Contracts | TypeScript-style interfaces. Cover success and error responses. |
| 8 | Data Models | Table format with field, type, constraints. Every entity from requirements must have a model. |
| 9 | Out of Scope | Explicit exclusions with reasons. Prevents scope creep during implementation. |
| Keyword | Meaning |
|---|---|
| MUST | Absolute requirement. Non-conformant without it. |
| MUST NOT | Absolute prohibition. |
| SHOULD | Recommended. Omit only with documented justification. |
| MAY | Optional. Implementer's discretion. |
See spec_format_guide.md for the complete template with section-by-section examples, good/bad requirement patterns, and feature-type templates (CRUD, Integration, Migration).
See acceptance_criteria_patterns.md for a full pattern library of Given/When/Then criteria across authentication, CRUD, search, file upload, payment, notification, and accessibility scenarios.
These rules define when an agent (human or AI) MUST stop and ask for guidance vs. when they can proceed independently.
Scope creep detected. The implementation requires something not in the spec. Even if it seems obviously needed, STOP. The spec might have excluded it deliberately.
Ambiguity exceeds 30%. If you cannot determine the correct behavior from the spec for more than 30% of a given requirement, the spec is incomplete. Do not guess.
Breaking changes required. The implementation would change an existing API contract, database schema, or public interface. Always escalate.
Security implications. Any change that touches authentication, authorization, encryption, or PII handling requires explicit approval.
Performance characteristics unknown. If a requirement says "MUST complete in < 500ms" but you have no way to measure or guarantee that, escalate before implementing a guess.
Cross-team dependencies. If the spec requires coordination with another team or service, confirm the dependency before building against it.
When you must stop, provide:
## Escalation: [Brief Title]
**Blocked on:** [requirement ID, e.g., FR-3]
**Question:** [Specific, answerable question — not "what should I do?"]
**Options considered:**
A. [Option] — Pros: [...] Cons: [...]
B. [Option] — Pros: [...] Cons: [...]
**My recommendation:** [A or B, with reasoning]
**Impact of waiting:** [What is blocked until this is resolved?]
Never escalate without a recommendation. Never present an open-ended question. Always give options.
See references/bounded_autonomy_rules.md for the complete decision matrix.
Goal: Understand what needs to be built and why.
Exit criteria: You can explain the feature to someone unfamiliar with the project in 2 minutes.
Goal: Produce a complete spec document following The Spec Format above.
Exit criteria: The spec can be handed to a developer who was not in the requirements meeting, and they can implement the feature without asking clarifying questions.
Goal: Verify the spec is complete, consistent, and implementable.
Run spec_validator.py against the spec file:
python spec_validator.py --file spec.md --strict
Manual validation checklist:
Exit criteria: Spec scores 80+ on validator, and all manual checklist items pass.
Goal: Extract test cases from acceptance criteria before writing implementation code.
Run test_extractor.py against the approved spec:
python test_extractor.py --file spec.md --framework pytest --output tests/
Exit criteria: You have a test file where every test fails with "not implemented" or equivalent.
Goal: Write code that makes failing tests pass, one acceptance criterion at a time.
Rules:
Exit criteria: All tests pass. All acceptance criteria satisfied.
Goal: Verify implementation matches spec before marking done.
Run through the Self-Review Checklist below. If any item fails, fix it before declaring the task complete.
Before marking any implementation as done, verify ALL of the following:
Spec-driven workflow and TDD are complementary, not competing:
Spec-Driven Workflow TDD (Red-Green-Refactor)
───────────────────── ──────────────────────────
Phase 1: Gather Requirements
Phase 2: Write Spec
Phase 3: Validate Spec
Phase 4: Generate Tests ──→ RED: Tests exist and fail
Phase 5: Implement ──→ GREEN: Minimal code to pass
Phase 6: Self-Review ──→ REFACTOR: Clean up internals
The handoff: Spec-driven workflow produces the test stubs (Phase 4). TDD takes over from there. The spec tells you WHAT to test. TDD tells you HOW to implement.
Use engineering-team/tdd-guide for:
Use engineering/spec-driven-workflow for:
A complete worked example (Password Reset spec with extracted test cases) is available in spec_format_guide.md. It demonstrates all 9 sections, requirement numbering, acceptance criteria, edge cases, and the corresponding pytest stubs generated by test_extractor.py.
Symptom: "I'll start coding while the spec is being reviewed." Problem: The review will surface changes. Now you have code that implements a rejected design. Rule: Implementation does not begin until spec status is "Approved."
Symptom: "The system should work well" or "The UI should be responsive." Problem: Untestable. What does "well" mean? What does "responsive" mean? Rule: Every acceptance criterion must be verifiable by a machine. If you cannot write a test for it, rewrite the criterion.
Symptom: Happy path is specified, error paths are not. Problem: Developers invent error handling on the fly, leading to inconsistent behavior. Rule: For every external dependency (API, database, file system, user input), specify at least one failure scenario.
Symptom: "Let me write the spec now that the feature is done." Problem: This is documentation, not specification. It describes what was built, not what should have been built. It cannot catch design errors because the design is already frozen. Rule: If the spec was written after the code, it is not a spec. Relabel it as documentation.
Symptom: "While I was in there, I also added..." Problem: Untested code. Unreviewed design. Potential for subtle bugs in the "bonus" feature. Rule: If it is not in the spec, it does not get built. File a new spec for additional features.
Symptom: AC-7 exists but does not reference any FR-* or NFR-. Problem: Orphaned criteria mean either a requirement is missing or the criterion is unnecessary. Rule: Every AC- MUST reference at least one FR-* or NFR-*.
Symptom: "The spec looks fine, let's just start."
Problem: Missing sections discovered during implementation cause blocking delays.
Rule: Always run spec_validator.py --strict before starting implementation. Fix all warnings.
engineering-team/tdd-guide — Red-green-refactor cycle, test generation, coverage analysis. Use after Phase 4 of this workflow.engineering/focused-fix — Deep-dive feature repair. When a spec-driven implementation has systemic issues, use focused-fix for diagnosis.engineering/rag-architect — If the feature involves retrieval or knowledge systems, use rag-architect for the technical design within the spec.references/spec_format_guide.md — Complete template with section-by-section explanations.references/bounded_autonomy_rules.md — Full decision matrix for when to stop vs. continue.references/acceptance_criteria_patterns.md — Pattern library for writing Given/When/Then criteria.| Script | Purpose | Key Flags |
|---|---|---|
spec_generator.py |
Generate spec template from feature name/description | --name, --description, --format, --json |
spec_validator.py |
Validate spec completeness (0-100 score) | --file, --strict, --json |
test_extractor.py |
Extract test stubs from acceptance criteria | --file, --framework, --output, --json |
# Generate a spec template
python spec_generator.py --name "User Authentication" --description "OAuth 2.0 login flow"
# Validate a spec
python spec_validator.py --file specs/auth.md --strict
# Extract test cases
python test_extractor.py --file specs/auth.md --framework pytest --output tests/test_auth.py