Design, schema-generate, and evaluate multi-agent systems with three deterministic tools. The scripts are the workflow — do not freehand an architecture when the planner can score one from requirements.
When NOT to use: Claude Code Workflow-tool automations → workflow-builder; single-agent workflow scaffolds → agent-workflow-designer; multi-agent fan-out at runtime → agenthub.
| Choose | When | Watch out for |
|---|---|---|
| Single agent | One bounded task, < ~5 tools | Don't add agents you don't need |
| Supervisor | Central decomposition, specialists report back | Supervisor becomes the bottleneck |
| Pipeline | Strictly sequential stages with handoffs | Rigid order; slowest stage gates throughput |
| Hierarchical | Multiple org layers, > ~8 agents | Communication overhead per level |
| Swarm | Parallel peers, fault tolerance over predictability | Hard to debug; needs consensus rules |
The planner applies this scoring deterministically — run it rather than picking by feel.
All paths relative to this skill folder. Each step's JSON output is the next step's design input.
Write a requirements JSON (copy assets/sample_system_requirements.json — keys: goal, tasks[], constraints{max_response_time, budget_per_task, concurrent_tasks}, team_size):
python3 agent_planner.py requirements.json --format json -o arch
Emits arch.json with architecture_design (pattern, agents, communication links), mermaid_diagram, and implementation_roadmap. Read architecture_design.pattern and the per-agent role list; present the mermaid diagram to the user.
Describe each agent's tools in plain JSON (copy assets/sample_tool_descriptions.json), then:
python3 tool_schema_generator.py tool_descriptions.json --validate -o tools
Emits tools.json (tool_schemas, validation_summary) plus provider-specific tools_anthropic.json / tools_openai.json. Gate: every tool must print ✓ Valid. Fix any invalid schema before proceeding — never hand an agent an unvalidated schema.
Once the system runs (or against assets/sample_execution_logs.json for a dry run):
python3 agent_evaluator.py execution_logs.json --detailed -o eval
Emits eval.json with summary, agent_metrics, bottleneck_analysis, error_analysis, cost_breakdown, sla_compliance, and optimization_recommendations, plus split files (eval_errors.json, eval_recommendations.json).
The design is not done until:
tool_schema_generator.py --validate reports 0 invalid schemas.agent_evaluator.py on a pilot run reports 0 critical issues (the tool prints CRITICAL: N critical issues when found). If N > 0, apply the top item in eval_recommendations.json, re-run the pilot, and re-evaluate.expected_outputs/ to confirm the schema shape you're consuming hasn't drifted.references/agent_architecture_patterns.md — pattern trade-offs in depthreferences/tool_design_best_practices.md — schema, idempotency, error-handling rulesreferences/evaluation_methodology.md — metric definitions the evaluator implements