Skills Engineering Firecrawl Incident Response Runbook

Firecrawl Incident Response Runbook

v20260423
firecrawl-incident-runbook
A comprehensive runbook providing systematic procedures for responding to Firecrawl API outages and integration failures. It guides technical teams through triage, diagnosis, and mitigation for common issues (e.g., 401, 429, 5xx errors). Includes steps for fallback implementation, evidence collection, and postmortem documentation to ensure rapid service recovery.
Get Skill
324 downloads
Overview

Firecrawl Incident Runbook

Overview

Rapid incident response procedures for Firecrawl integration failures. Covers API outage triage, credential issues, credit exhaustion, crawl job failures, and webhook delivery problems.

Severity Levels

Level Definition Response Time Examples
P1 Complete failure < 15 min API returns 401/500 on all requests
P2 Degraded service < 1 hour High latency, partial failures, 429s
P3 Minor impact < 4 hours Webhook delays, some empty scrapes
P4 No user impact Next business day Monitoring gaps, credit warnings

Quick Triage (Run First)

set -euo pipefail
# 1. Test Firecrawl API directly
echo "=== API Health ==="
curl -s -w "\nHTTP %{http_code}\n" https://api.firecrawl.dev/v1/scrape \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","formats":["markdown"]}' | jq '{success, error}'

# 2. Check credit balance
echo "=== Credits ==="
curl -s https://api.firecrawl.dev/v1/team/credits \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" | jq .

# 3. Check our app health
echo "=== App Health ==="
curl -sf https://api.yourapp.com/health | jq '.services.firecrawl' || echo "App unhealthy"

Decision Tree

Firecrawl API returning errors?
├─ 401: API key invalid
│   → Verify key at firecrawl.dev/app, rotate if needed
├─ 402: Credits exhausted
│   → Upgrade plan or wait for monthly reset
├─ 429: Rate limited
│   → Reduce concurrency, enable backoff, check Retry-After
├─ 500/503: Firecrawl outage
│   → Enable fallback mode, monitor firecrawl.dev status
└─ API working fine
    └─ Our integration issue
        ├─ Empty markdown → Increase waitFor, check target site
        ├─ Crawl stuck → Check job status, enforce timeout
        └─ Webhook not firing → Verify endpoint, check signature

Immediate Actions by Error Type

401 — Authentication Failure

set -euo pipefail
# Verify current key
echo "Key prefix: ${FIRECRAWL_API_KEY:0:5}"
echo "Key length: ${#FIRECRAWL_API_KEY}"

# Test with explicit key
curl -s https://api.firecrawl.dev/v1/scrape \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","formats":["markdown"]}' | jq .success

# If fails: regenerate key at firecrawl.dev/app and update all environments

402 — Credits Exhausted

set -euo pipefail
# Check balance
curl -s https://api.firecrawl.dev/v1/team/credits \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" | jq .

# Immediate: disable non-critical scraping
# Long-term: upgrade plan or implement credit budget

429 — Rate Limited

// Enable emergency rate limiting
const EMERGENCY_DELAY_MS = 5000; // 5s between requests

async function emergencyScrape(url: string) {
  await new Promise(r => setTimeout(r, EMERGENCY_DELAY_MS));
  return firecrawl.scrapeUrl(url, { formats: ["markdown"] });
}

500/503 — Firecrawl Outage

// Enable graceful degradation
async function scrapeWithFallback(url: string) {
  try {
    return await firecrawl.scrapeUrl(url, { formats: ["markdown"] });
  } catch (error: any) {
    if (error.statusCode >= 500) {
      console.error("Firecrawl unavailable — using cached content");
      return getCachedContent(url); // serve stale data
    }
    throw error;
  }
}

Communication Templates

Internal (Slack)

P[1-4] INCIDENT: Firecrawl Integration
Status: INVESTIGATING
Impact: [Describe user-facing impact]
Error: [401/402/429/500] — [brief description]
Action: [What you're doing right now]
Next update: [time]

Post-Incident

Evidence Collection

set -euo pipefail
# Collect debug bundle
mkdir -p incident-$(date +%Y%m%d)
curl -s https://api.firecrawl.dev/v1/team/credits \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" > incident-$(date +%Y%m%d)/credits.json

# Application logs
kubectl logs -l app=my-app --since=1h | grep -i firecrawl > incident-$(date +%Y%m%d)/logs.txt 2>/dev/null || true

Postmortem Template

## Incident: Firecrawl [Error Type]
Date: YYYY-MM-DD | Duration: X hours | Severity: P[1-4]

### Summary
[1-2 sentence description]

### Timeline
- HH:MM — [First alert]
- HH:MM — [Investigation started]
- HH:MM — [Root cause identified]
- HH:MM — [Resolved]

### Root Cause
[Technical explanation]

### Action Items
- [ ] [Preventive measure] — Owner — Due date

Error Handling

Issue Cause Solution
Can't reach Firecrawl API Network/DNS issue Try from different network, check DNS
All scrapes return empty Target site changed Verify manually, adjust scrape options
Crawl jobs never complete Queue backup Cancel stuck jobs, reduce concurrency
Webhook endpoint unreachable Deployment issue Check HTTPS cert, DNS, firewall

Resources

Next Steps

For data handling, see firecrawl-data-handling.

Info
Category Engineering
Name firecrawl-incident-runbook
Version v20260423
Size 5.82KB
Updated At 2026-04-28
Language