技能 编程开发 多智能体系统架构设计开发

多智能体系统架构设计开发

v20260509
multi-agent-architect
本技能专注于设计和优化生产级别的多智能体系统架构。它深入讲解如何利用LangGraph、LangChain等工具构建复杂的AI工作流,涵盖了状态管理、智能体路由、记忆系统和工具调用。适用于构建超级管理者、规划器等健壮、可扩展的AI系统。
获取技能
346 次下载
概览

Multi-Agent Architect & Updater Skill

Overview

This skill turns Claude into a Senior AI Multi-Agent Architect specialized in LangGraph, LangChain, and DeepAgents. It provides structured workflows for creating and updating production-grade multi-agent systems — including supervisor agents, planners, researchers, coders, and memory-backed autonomous pipelines. Use it whenever you need to design, build, debug, or scale any multi-agent AI system.

If this skill adapts material from an external GitHub repository, declare both:

  • source_repo: owner/repo
  • source_type: official or source_type: community

When to Use This Skill

  • Use when you need to create a new agent or multi-agent workflow from scratch
  • Use when working with LangGraph state graphs, nodes, edges, or conditional routing
  • Use when the user asks about agent communication, memory systems, or tool-calling pipelines
  • Use when debugging or optimizing an existing LangChain/LangGraph agent system
  • Use when architecting supervisor, planner, research, coding, or validation agent roles
  • Use when integrating DeepAgents with hierarchical planning and delegation

How It Works

Step 1: Understand the Goal

Before writing any code, clarify:

  • What is the business objective this agent system must achieve?
  • What agent roles are needed (supervisor, planner, researcher, coder, validator)?
  • What tools does each agent require?
  • What memory strategy is needed (Redis, Vector DB, LangChain Memory)?
  • What communication protocol connects agents (shared state, message passing)?

Step 2: Define the State Schema

All agents share a typed state object passed through the graph:

from typing import TypedDict

class AgentState(TypedDict):
    user_goal: str
    tasks: list[str]
    completed_tasks: list[str]
    next_agent: str
    context: dict
    step_count: int          # guards against infinite loops
    error: str | None

Step 3: Define Agent Nodes

Each agent is an async function that reads from state and returns an updated state:

import logging
from langchain_openai import ChatOpenAI

logger = logging.getLogger(__name__)

async def research_node(state: AgentState) -> AgentState:
    logger.info("research_node: starting")
    llm = ChatOpenAI(model="gpt-4o")
    result = await llm.bind_tools(research_tools).ainvoke(state["user_goal"])
    state["context"]["research"] = result.content
    state["next_agent"] = "coder"
    return state

Step 4: Build the LangGraph

Wire nodes together with edges and conditional routing:

from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode

def build_graph() -> StateGraph:
    graph = StateGraph(AgentState)

    graph.add_node("supervisor", supervisor_node)
    graph.add_node("research",   research_node)
    graph.add_node("coder",      coding_node)
    graph.add_node("validator",  validation_node)
    graph.add_node("tools",      ToolNode(all_tools))

    graph.set_entry_point("supervisor")

    graph.add_conditional_edges(
        "supervisor",
        route_next,
        {"research": "research", "coder": "coder", "end": END}
    )

    graph.add_edge("research",  "supervisor")
    graph.add_edge("coder",     "validator")
    graph.add_edge("validator", "supervisor")

    return graph.compile()

def route_next(state: AgentState) -> str:
    if state["step_count"] > 20:
        return "end"
    return state["next_agent"]

Step 5: Add Memory

from langchain_community.chat_message_histories import RedisChatMessageHistory

def get_memory(session_id: str):
    return RedisChatMessageHistory(
        session_id=session_id,
        url=os.getenv("REDIS_URL"),
        ttl=3600
    )

Step 6: Run the Graph

async def run(user_goal: str, session_id: str):
    graph = build_graph()
    initial_state = AgentState(
        user_goal=user_goal,
        tasks=[],
        completed_tasks=[],
        next_agent="supervisor",
        context={},
        step_count=0,
        error=None,
    )
    return await graph.ainvoke(initial_state)

Step 7: Expose via FastAPI (optional)

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class RunRequest(BaseModel):
    goal: str
    session_id: str

@app.post("/run")
async def run_agent(req: RunRequest):
    result = await run(req.goal, req.session_id)
    return {"result": result}

Updating an Existing Agent

When the user wants to update or debug an existing agent, structure the response as:

## Existing Issue
[Describe the current problem]

## Root Cause
[Identify why it's happening in the architecture]

## Proposed Update
[Outline the changes at architecture level]

## Updated Code
[Generate only the changed modules]

## Migration Notes
[What breaks, what's backward-compatible]

## Performance Impact
[Latency / token / memory delta]

Standard Folder Structure

Always generate code in this layout:

multi_agent_system/
├── agents/          # One file per agent role
├── tools/           # Tool definitions and wrappers
├── memory/          # Redis, VectorDB, LangChain memory helpers
├── prompts/         # Prompt templates (one per agent)
├── workflows/       # High-level orchestration logic
├── graphs/          # LangGraph state + compiled graph definitions
├── api/             # FastAPI routes (optional)
├── configs/         # Config loader — no secrets in code
├── tests/           # Unit + integration tests per agent
└── main.py

Examples

Example 1: Research + Coding Multi-Agent Workflow

# agents/research_agent.py
async def research_node(state: AgentState) -> AgentState:
    llm = ChatOpenAI(model="gpt-4o").bind_tools([web_search, rag_search])
    response = await llm.ainvoke(
        f"Research the following and return structured findings:\n{state['user_goal']}"
    )
    state["context"]["research"] = response.content
    state["next_agent"] = "coder"
    return state

# agents/coding_agent.py
async def coding_node(state: AgentState) -> AgentState:
    llm = ChatOpenAI(model="gpt-4o").bind_tools([python_repl, github_tool])
    response = await llm.ainvoke(
        f"Given this research:\n{state['context']['research']}\n\nWrite production Python code."
    )
    state["context"]["code"] = response.content
    state["next_agent"] = "validator"
    return state

Example 2: Supervisor with Dynamic Delegation

# agents/supervisor_agent.py
DELEGATION_PROMPT = """
You are a supervisor. Given the current state, decide the next agent.
Available agents: research, coder, validator, end.
Respond with ONLY the agent name.

Goal: {goal}
Completed: {completed}
Context keys available: {context}
"""

async def supervisor_node(state: AgentState) -> AgentState:
    state["step_count"] += 1
    llm = ChatOpenAI(model="gpt-4o")
    decision = await llm.ainvoke(
        DELEGATION_PROMPT.format(
            goal=state["user_goal"],
            completed=state["completed_tasks"],
            context=list(state["context"].keys()),
        )
    )
    next_agent = decision.content.strip().lower()
    # Validate against allowlist before setting
    allowed = {"research", "coder", "validator", "end"}
    state["next_agent"] = next_agent if next_agent in allowed else "end"
    return state

Example 3: DeepAgents Reflection Loop

async def reflection_node(state: AgentState) -> AgentState:
    llm = ChatOpenAI(model="gpt-4o")
    critique = await llm.ainvoke(
        f"Evaluate this output critically:\n{state['context'].get('code', '')}\n"
        "List any bugs, gaps, or improvements. Be concise."
    )
    state["context"]["critique"] = critique.content
    state["next_agent"] = "coder" if "bug" in critique.content.lower() else "end"
    return state

Best Practices

  • ✅ One agent = one responsibility — never combine planning + coding + testing in one node
  • ✅ Use TypedDict for all state schemas — enables type checking and graph validation
  • ✅ Bind only the tools each agent needs — reduces hallucinated tool calls
  • ✅ Always add a step_count guard to prevent infinite routing loops
  • ✅ Use async/await throughout — LangGraph supports async natively
  • ✅ Store all secrets in environment variables loaded via os.getenv()
  • ✅ Set TTLs on all Redis keys scoped to session_id
  • ✅ Log at every node entry and tool call for observability
  • ✅ Validate supervisor routing output against an allowlist of agent names
  • ❌ Don't hardcode API keys, model names, or Redis URLs
  • ❌ Don't share tool lists across agents that don't need them
  • ❌ Don't skip error handling — tool failures and empty LLM responses are common
  • ❌ Don't trust unvalidated LLM routing decisions — always check against an allowlist

Limitations

  • This skill does not replace environment-specific testing, load testing, or security review before production deployment.
  • Generated LangGraph code targets the current stable API — always verify method signatures against your installed version (pip show langgraph).
  • Stop and ask for clarification if the agent's goal, tool permissions, or routing logic is ambiguous before generating a full architecture.
  • DeepAgents integration patterns assume the library is installed and configured in the target environment.

Security & Safety Notes

  • Never expose API keys in generated code. All secrets must use environment variables:
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")   # ✅ correct
    OPENAI_API_KEY = "sk-..."                        # ❌ never do this
    
  • Always validate and sanitize user inputs before injecting them into agent prompts — treat all user input as untrusted.
  • Add a permission layer before allowing agents to execute shell commands or write to filesystems.
  • If generating a Python REPL tool node, document that it must only run in a sandboxed, isolated environment.
  • For production deployments, add rate-limit handling and exponential backoff on all LLM and external API calls.
  • Scope all Redis session keys to session_id and set a TTL to prevent memory leaks across sessions.

Common Pitfalls

  • Problem: Agent loops indefinitely between supervisor and sub-agents
    Solution: Add step_count: int to state; return "end" in route_next() when step_count > N

  • Problem: Supervisor routes to a non-existent agent name
    Solution: Validate the LLM's routing output against a hardcoded allowlist before setting next_agent

  • Problem: Memory leaks across user sessions
    Solution: Scope Redis keys to session_id and always set a TTL (ttl=3600)

  • Problem: Tool results are ignored by the next agent
    Solution: Always write tool output into state["context"] and confirm the next node reads it

  • Problem: Agents share too many tools and hallucinate wrong tool calls
    Solution: Use .bind_tools([only_relevant_tools]) per agent instead of a global tool list

  • Problem: Graph fails silently on API rate limits
    Solution: Wrap LLM calls in retry logic with exponential backoff using tenacity


Related Skills

  • @langchain-rag - When you need retrieval-augmented generation pipelines specifically
  • @fastapi-backend - When deploying agent systems as production REST APIs
  • @python-async - When deepening async/await patterns used throughout agent nodes
信息
Category 编程开发
Name multi-agent-architect
版本 v20260509
大小 12.01KB
更新时间 2026-05-10
语言