技能 编程开发 AI API可靠性与容错设计

AI API可靠性与容错设计

v20260423
anth-reliability-patterns
本模块提供了集成Claude API等AI服务时,关键的可靠性设计模式(如断路器、优雅降级、幂等性)的实现。它帮助开发者构建生产级的、高弹性的应用系统,确保即使API发生故障、限流或暂时不可用,应用也能平稳运行,大幅提高系统的鲁棒性和可用性。
获取技能
438 次下载
概览

Anthropic Reliability Patterns

Overview

Production reliability patterns for Claude API: circuit breaker (prevent cascading failures), graceful degradation (serve fallbacks), idempotency (safe retries), and timeout management.

Circuit Breaker

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"       # Normal operation
    OPEN = "open"           # Failing, reject requests
    HALF_OPEN = "half_open" # Testing recovery

class ClaudeCircuitBreaker:
    def __init__(self, failure_threshold: int = 5, recovery_timeout: int = 60):
        self.state = CircuitState.CLOSED
        self.failures = 0
        self.threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.last_failure_time = 0.0

    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker OPEN — Claude API unavailable")

        try:
            result = func(*args, **kwargs)
            if self.state == CircuitState.HALF_OPEN:
                self.state = CircuitState.CLOSED
                self.failures = 0
            return result
        except Exception as e:
            self.failures += 1
            self.last_failure_time = time.time()
            if self.failures >= self.threshold:
                self.state = CircuitState.OPEN
            raise

# Usage
breaker = ClaudeCircuitBreaker(failure_threshold=5, recovery_timeout=60)

def safe_claude_call(prompt: str) -> str:
    try:
        return breaker.call(
            client.messages.create,
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}]
        ).content[0].text
    except Exception:
        return "AI assistant is temporarily unavailable."

Graceful Degradation

import anthropic

def complete_with_fallback(prompt: str) -> str:
    """Try Sonnet → Haiku → cached response → static fallback."""
    models = ["claude-sonnet-4-20250514", "claude-haiku-4-20250514"]

    for model in models:
        try:
            msg = client.messages.create(
                model=model,
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}]
            )
            return msg.content[0].text
        except anthropic.RateLimitError:
            continue  # Try cheaper model
        except anthropic.APIStatusError:
            continue  # Try next model

    # All models failed — return cached or static response
    cached = cache.get(f"claude:{hash(prompt)}")
    if cached:
        return f"[Cached response] {cached}"

    return "Our AI assistant is temporarily unavailable. Please try again in a few minutes."

Idempotent Requests

import hashlib
import json

class IdempotentClaude:
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.cache = {}  # Use Redis in production

    def create_message(self, idempotency_key: str | None = None, **kwargs) -> str:
        # Generate deterministic key from request params if not provided
        if not idempotency_key:
            idempotency_key = hashlib.sha256(
                json.dumps(kwargs, sort_keys=True, default=str).encode()
            ).hexdigest()

        # Return cached result for duplicate requests
        if idempotency_key in self.cache:
            return self.cache[idempotency_key]

        msg = self.client.messages.create(**kwargs)
        result = msg.content[0].text
        self.cache[idempotency_key] = result
        return result

Timeout Configuration

# Layer timeouts for defense-in-depth
client = anthropic.Anthropic(
    timeout=60.0,      # SDK-level timeout (covers connect + read)
    max_retries=3,     # Auto-retry on 429/5xx
)

# Per-request timeout override
msg = client.messages.create(
    model="claude-haiku-4-20250514",
    max_tokens=64,
    messages=[{"role": "user", "content": "Quick question"}],
    timeout=10.0  # Override for fast operations
)

Reliability Checklist

  • Circuit breaker prevents cascading failures
  • Graceful degradation serves fallback responses
  • Idempotency keys prevent duplicate processing
  • Timeouts configured at SDK and application level
  • Health check probes API connectivity
  • Retry logic uses exponential backoff (SDK default)
  • Rate limit headers monitored for pre-emptive throttling

Resources

Next Steps

For policy guardrails, see anth-policy-guardrails.

信息
Category 编程开发
Name anth-reliability-patterns
版本 v20260423
大小 5.19KB
更新时间 2026-04-28
语言