Multi-Agent System Design & Evaluation

v20260612

agent-designer

A comprehensive tool for designing, generating schemas, and evaluating complex multi-agent systems. It guides users through defining system requirements, selecting optimal orchestration patterns (Supervisor, Swarm, Pipeline), generating provider-ready tool schemas (Anthropic/OpenAI), and analyzing execution logs for performance bottlenecks, cost, and latency. This ensures robust, production-grade autonomous workflows.

Agent Multi-Agent System Architecture Design Evaluation Workflow Tool-Schema

Get Skill

137 downloads

Overview

Agent Designer — Multi-Agent System Architecture

Design, schema-generate, and evaluate multi-agent systems with three deterministic tools. The scripts are the workflow — do not freehand an architecture when the planner can score one from requirements.

When to use

Designing a new multi-agent system from requirements (pattern choice, roles, comms)
Generating provider-ready tool schemas (Anthropic + OpenAI formats) from plain tool descriptions
Evaluating execution logs: success rate, latency distribution, cost, bottlenecks

When NOT to use: Claude Code Workflow-tool automations → workflow-builder; single-agent workflow scaffolds → agent-workflow-designer; multi-agent fan-out at runtime → agenthub.

Pattern decision table

Choose	When	Watch out for
Single agent	One bounded task, < ~5 tools	Don't add agents you don't need
Supervisor	Central decomposition, specialists report back	Supervisor becomes the bottleneck
Pipeline	Strictly sequential stages with handoffs	Rigid order; slowest stage gates throughput
Hierarchical	Multiple org layers, > ~8 agents	Communication overhead per level
Swarm	Parallel peers, fault tolerance over predictability	Hard to debug; needs consensus rules

The planner applies this scoring deterministically — run it rather than picking by feel.

Workflow

All paths relative to this skill folder. Each step's JSON output is the next step's design input.

1. Design the architecture

Write a requirements JSON (copy assets/sample_system_requirements.json — keys: goal, tasks[], constraints{max_response_time, budget_per_task, concurrent_tasks}, team_size):

python3 agent_planner.py requirements.json --format json -o arch

Emits arch.json with architecture_design (pattern, agents, communication links), mermaid_diagram, and implementation_roadmap. Read architecture_design.pattern and the per-agent role list; present the mermaid diagram to the user.

2. Generate tool schemas

Describe each agent's tools in plain JSON (copy assets/sample_tool_descriptions.json), then:

python3 tool_schema_generator.py tool_descriptions.json --validate -o tools

Emits tools.json (tool_schemas, validation_summary) plus provider-specific tools_anthropic.json / tools_openai.json. Gate: every tool must print ✓ Valid. Fix any invalid schema before proceeding — never hand an agent an unvalidated schema.

3. Evaluate execution logs

Once the system runs (or against assets/sample_execution_logs.json for a dry run):

python3 agent_evaluator.py execution_logs.json --detailed -o eval

Emits eval.json with summary, agent_metrics, bottleneck_analysis, error_analysis, cost_breakdown, sla_compliance, and optimization_recommendations, plus split files (eval_errors.json, eval_recommendations.json).

4. Verification loop

The design is not done until:

tool_schema_generator.py --validate reports 0 invalid schemas.
agent_evaluator.py on a pilot run reports 0 critical issues (the tool prints CRITICAL: N critical issues when found). If N > 0, apply the top item in eval_recommendations.json, re-run the pilot, and re-evaluate.
Compare your outputs against expected_outputs/ to confirm the schema shape you're consuming hasn't drifted.

References

references/agent_architecture_patterns.md — pattern trade-offs in depth
references/tool_design_best_practices.md — schema, idempotency, error-handling rules
references/evaluation_methodology.md — metric definitions the evaluator implements

Info

Category Development

Name agent-designer

Version v20260612

Size 64.42KB

Source alirezarezvani/claude-skills

Updated At 2026-06-13