技能 编程开发 Bright Data限速处理与重试机制

Bright Data限速处理与重试机制

v20260423
brightdata-rate-limits
本技能集专注于解决使用Bright Data服务进行大规模网络爬取和API数据采集时遇到的速率限制(Rate Limit)和限流问题。它指导如何使用并发队列控制请求并发数,实现指数退避(Exponential Backoff)重试机制,并针对Bright Data的特定API触发点进行精确的限流控制,确保爬虫任务的稳定性和可靠性。
获取技能
93 次下载
概览

Bright Data Rate Limits

Overview

Handle Bright Data rate limits and concurrent request limits. Unlike traditional API rate limits, Bright Data limits are per-zone and based on concurrent connections and requests per second. The Web Scraper API trigger endpoint is limited to 20 requests/min and 60 requests/hour.

Prerequisites

  • Bright Data zone configured
  • Understanding of async/await patterns
  • p-queue or similar concurrency library

Instructions

Step 1: Understand Bright Data Rate Limits

Product Concurrent Limit Per-Minute Notes
Residential Proxy Based on plan No hard cap Charged per GB
Web Unlocker Based on plan No hard cap Charged per request
Scraping Browser Based on plan sessions No hard cap Charged per session
SERP API Based on plan No hard cap Charged per search
Web Scraper API (trigger) N/A 20/min, 60/hr Async collections
Datasets API N/A 20/min Snapshot requests

Step 2: Implement Concurrent Request Limiter

// src/brightdata/limiter.ts
import PQueue from 'p-queue';

// Match concurrency to your Bright Data plan limits
const scrapeQueue = new PQueue({
  concurrency: 10,       // Max concurrent proxy requests
  interval: 1000,        // Per second
  intervalCap: 20,       // Max 20 requests per second
  timeout: 120000,       // Kill after 2 min
  throwOnTimeout: true,
});

export async function queuedScrape(url: string): Promise<string> {
  return scrapeQueue.add(async () => {
    const client = getBrightDataClient();
    const response = await client.get(url);
    return response.data;
  });
}

// Monitor queue health
scrapeQueue.on('active', () => {
  console.log(`Queue: ${scrapeQueue.size} waiting, ${scrapeQueue.pending} active`);
});

Step 3: Exponential Backoff for Proxy Errors

// src/brightdata/backoff.ts
export async function scrapeWithBackoff(
  url: string,
  config = { maxRetries: 5, baseDelay: 2000, maxDelay: 60000 }
): Promise<string> {
  for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
    try {
      const client = getBrightDataClient();
      const response = await client.get(url);
      return response.data;
    } catch (error: any) {
      const status = error.response?.status;
      const luminatiError = error.response?.headers?.['x-luminati-error'];

      // Only retry on transient errors
      const retryable = [502, 503, 429].includes(status)
        || error.code === 'ETIMEDOUT'
        || luminatiError === 'ip_banned';

      if (attempt === config.maxRetries || !retryable) throw error;

      const delay = Math.min(
        config.baseDelay * Math.pow(2, attempt) + Math.random() * 1000,
        config.maxDelay
      );
      console.log(`[${luminatiError || status}] Retry ${attempt + 1} in ${delay.toFixed(0)}ms`);
      await new Promise(r => setTimeout(r, delay));
    }
  }
  throw new Error('Unreachable');
}

Step 4: Web Scraper API Rate Limiter

// src/brightdata/trigger-limiter.ts — 20/min, 60/hr for trigger endpoint
const triggerQueue = new PQueue({
  concurrency: 1,
  interval: 60000,       // Per minute
  intervalCap: 20,       // 20 triggers per minute
});

let hourlyCount = 0;
setInterval(() => { hourlyCount = 0; }, 3600000); // Reset hourly

export async function rateLimitedTrigger(
  datasetId: string,
  urls: string[]
): Promise<any> {
  if (hourlyCount >= 55) { // Leave buffer
    throw new Error('Approaching hourly trigger limit (60/hr). Wait before triggering.');
  }

  return triggerQueue.add(async () => {
    hourlyCount++;
    const response = await fetch(
      `https://api.brightdata.com/datasets/v3/trigger?dataset_id=${datasetId}&format=json`,
      {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${process.env.BRIGHTDATA_API_TOKEN}`,
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(urls.map(url => ({ url }))),
      }
    );
    return response.json();
  });
}

Step 5: Batch URLs to Minimize Triggers

// Instead of triggering per-URL, batch into single triggers
async function batchTrigger(urls: string[], batchSize = 100) {
  const batches = [];
  for (let i = 0; i < urls.length; i += batchSize) {
    batches.push(urls.slice(i, i + batchSize));
  }

  console.log(`Triggering ${urls.length} URLs in ${batches.length} batches`);

  for (const batch of batches) {
    await rateLimitedTrigger('gd_dataset_id', batch);
  }
}

Output

  • Concurrent request limiter matching plan limits
  • Exponential backoff handling X-Luminati error headers
  • Web Scraper API trigger rate limiter (20/min, 60/hr)
  • URL batching to minimize trigger count

Error Handling

Signal Meaning Action
HTTP 429 Concurrent limit exceeded Queue requests with p-queue
HTTP 502 + ip_banned IP blocked by target Retry (auto-rotates IP)
HTTP 502 + target_site_blocked Anti-bot blocked Switch to Scraping Browser
ETIMEDOUT Connection timeout Retry with longer timeout
Hourly trigger limit 60 triggers/hr exceeded Batch URLs into fewer triggers

Resources

Next Steps

For security configuration, see brightdata-security-basics.

信息
Category 编程开发
Name brightdata-rate-limits
版本 v20260423
大小 6.05KB
更新时间 2026-04-28
语言