技能 人工智能 Anthropic Claude速率限制管理

Anthropic Claude速率限制管理

v20260423
anth-rate-limits
用于管理Anthropic Claude API的速率限制和配额。它实现了自动重试、指数退避(Backoff)机制,并支持根据请求量、输入和输出令牌数(RPM/ITPM/OTPM)进行精细化的流量控制。适用于高并发、需要稳定计费和调用的生产级API集成场景。
获取技能
360 次下载
概览

Anthropic Rate Limits

Overview

The Claude API uses token-bucket rate limiting measured in three dimensions: requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM). Limits increase automatically as you move through usage tiers.

Rate Limit Dimensions

Dimension Header Description
RPM anthropic-ratelimit-requests-limit Requests per minute
ITPM anthropic-ratelimit-tokens-limit Input tokens per minute
OTPM anthropic-ratelimit-tokens-limit Output tokens per minute

Limits are per-organization and per-model-class. Cached input tokens do NOT count toward ITPM limits.

Usage Tiers (Auto-Upgrade)

Tier Monthly Spend Key Benefit
Tier 1 (Free) $0 Evaluation access
Tier 2 $40+ Higher RPM
Tier 3 $200+ Production-grade limits
Tier 4 $2,000+ High-throughput access
Scale Custom Custom limits via sales

Check your current tier and limits at console.anthropic.com.

SDK Built-In Retry

import anthropic

# The SDK retries 429 and 5xx errors automatically (2 retries by default)
client = anthropic.Anthropic(max_retries=5)  # Increase for high-traffic apps

# Disable auto-retry for manual control
client = anthropic.Anthropic(max_retries=0)
const client = new Anthropic({ maxRetries: 5 });

Custom Rate Limiter with Header Awareness

import time
import anthropic

class RateLimitedClient:
    def __init__(self):
        self.client = anthropic.Anthropic(max_retries=0)  # We handle retries
        self.remaining_requests = 100
        self.remaining_tokens = 100000
        self.reset_at = 0.0

    def create_message(self, **kwargs):
        # Pre-check: wait if near limit
        if self.remaining_requests < 3 and time.time() < self.reset_at:
            wait = self.reset_at - time.time()
            print(f"Pre-throttle: waiting {wait:.1f}s")
            time.sleep(wait)

        for attempt in range(5):
            try:
                response = self.client.messages.create(**kwargs)

                # Update from response headers (via _response)
                headers = response._response.headers
                self.remaining_requests = int(headers.get("anthropic-ratelimit-requests-remaining", 100))
                self.remaining_tokens = int(headers.get("anthropic-ratelimit-tokens-remaining", 100000))
                reset = headers.get("anthropic-ratelimit-requests-reset")
                if reset:
                    from datetime import datetime
                    self.reset_at = datetime.fromisoformat(reset.replace("Z", "+00:00")).timestamp()

                return response
            except anthropic.RateLimitError as e:
                retry_after = float(e.response.headers.get("retry-after", 2 ** attempt))
                print(f"429 — retry in {retry_after}s (attempt {attempt + 1})")
                time.sleep(retry_after)

        raise Exception("Exhausted rate limit retries")

Queue-Based Throughput Control

import PQueue from 'p-queue';
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

// Enforce 50 RPM with concurrency limit
const queue = new PQueue({
  concurrency: 10,
  interval: 60_000,
  intervalCap: 50,
});

async function rateLimitedCall(prompt: string) {
  return queue.add(() =>
    client.messages.create({
      model: 'claude-sonnet-4-20250514',
      max_tokens: 1024,
      messages: [{ role: 'user', content: prompt }],
    })
  );
}

// Process 200 prompts without hitting limits
const results = await Promise.all(
  prompts.map(p => rateLimitedCall(p))
);

Cost-Saving: Use Batches for Bulk Work

# Message Batches API: 50% cheaper, no rate limit pressure on real-time quota
batch = client.messages.batches.create(
    requests=[
        {"custom_id": f"req-{i}", "params": {
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 1024,
            "messages": [{"role": "user", "content": prompt}]
        }}
        for i, prompt in enumerate(prompts)
    ]
)

Error Handling

Header Description Action
retry-after Seconds until next request allowed Sleep this duration exactly
anthropic-ratelimit-requests-remaining Requests left in window Throttle if < 5
anthropic-ratelimit-tokens-remaining Tokens left in window Reduce max_tokens if low
anthropic-ratelimit-requests-reset ISO timestamp of window reset Schedule retry after this time

Resources

Next Steps

For security configuration, see anth-security-basics.

信息
Category 人工智能
Name anth-rate-limits
版本 v20260423
大小 5.42KB
更新时间 2026-04-28
语言