技能 编程开发 Firecrawl Webhook 事件处理

Firecrawl Webhook 事件处理

v20260311
firecrawl-webhooks-events
验证 Firecrawl webhook 签名并处理爬虫/抓取事件,将结果转入处理器或队列,确保异步任务状态一致。
获取技能
50 次下载
概览

Firecrawl Webhooks & Events

Overview

Handle Firecrawl webhooks for async crawl and scrape job notifications. Firecrawl supports native webhooks on crawl jobs via api.firecrawl.dev -- when a crawl completes, partially completes, or fails, Firecrawl POSTs results to your configured webhook URL.

Prerequisites

  • Firecrawl API key stored in FIRECRAWL_API_KEY environment variable
  • HTTPS endpoint accessible from the internet for webhook delivery
  • Understanding of Firecrawl crawl vs scrape vs map endpoints
  • Queue system for processing large crawl results

Webhook Event Types

Event Trigger Payload
crawl.completed Full crawl finishes Array of page results, metadata
crawl.page Individual page scraped Single page markdown/HTML content
crawl.failed Crawl job errors Error message, partial results
crawl.started Crawl begins Job ID, configuration
batch_scrape.completed Batch scrape finishes Array of scraped pages

Instructions

Step 1: Start a Crawl with Webhook

import FirecrawlApp from "@mendable/firecrawl-js";

const firecrawl = new FirecrawlApp({
  apiKey: process.env.FIRECRAWL_API_KEY!,
});

const crawlResult = await firecrawl.asyncCrawlUrl("https://docs.example.com", {
  limit: 100,
  scrapeOptions: {
    formats: ["markdown", "html"],
  },
  webhook: "https://api.yourapp.com/webhooks/firecrawl",
});

console.log(`Crawl started: ${crawlResult.id}`);

Step 2: Handle Webhook Events

import express from "express";

const app = express();
app.use(express.json());

app.post("/webhooks/firecrawl", async (req, res) => {
  const { type, id, data } = req.body;

  res.status(200).json({ received: true });  # HTTP 200 OK

  switch (type) {
    case "crawl.completed":
      await handleCrawlComplete(id, data);
      break;
    case "crawl.page":
      await handlePageScraped(id, data);
      break;
    case "crawl.failed":
      await handleCrawlFailed(id, data);
      break;
  }
});

async function handleCrawlComplete(jobId: string, data: any) {
  const { pages, totalPages } = data;
  console.log(`Crawl ${jobId} complete: ${totalPages} pages scraped`);

  for (const page of pages) {
    await indexPage({
      url: page.metadata?.sourceURL,
      title: page.metadata?.title,
      markdown: page.markdown,
      scrapedAt: new Date(),
    });
  }
}

Step 3: Process Individual Pages

async function handlePageScraped(jobId: string, data: any) {
  const { markdown, metadata } = data;

  // Extract structured data from scraped page
  await documentStore.upsert({
    url: metadata.sourceURL,
    title: metadata.title,
    content: markdown,
    links: metadata.links || [],
    statusCode: metadata.statusCode,
  });
}

async function handleCrawlFailed(jobId: string, data: any) {
  console.error(`Crawl ${jobId} failed: ${data.error}`);

  await alerting.send({
    severity: "high",
    message: `Firecrawl job ${jobId} failed`,
    error: data.error,
    partialResults: data.partialResults?.length || 0,
  });
}

Step 4: Poll as Fallback

async function pollCrawlStatus(jobId: string) {
  const status = await firecrawl.checkCrawlStatus(jobId);

  if (status.status === "completed") {
    await handleCrawlComplete(jobId, { pages: status.data });
  } else if (status.status === "scraping") {
    console.log(`Progress: ${status.completed}/${status.total} pages`);
    setTimeout(() => pollCrawlStatus(jobId), 5000);  # 5000: 5 seconds in ms
  }
}

Error Handling

Issue Cause Solution
Webhook not received URL unreachable Verify endpoint is publicly accessible
Partial results Page limit reached Increase limit or filter with includePaths
Rate limited Too many concurrent crawls Queue crawl jobs, respect rate limits
Timeout on large sites Crawl takes too long Use maxDepth and includePaths filters

Examples

Build Search Index from Crawl

async function buildSearchIndex(pages: any[]) {
  const documents = pages.map(page => ({
    id: page.metadata.sourceURL,
    content: page.markdown,
    title: page.metadata.title,
    url: page.metadata.sourceURL,
  }));

  await searchEngine.indexBatch(documents);
  console.log(`Indexed ${documents.length} pages`);
}

Resources

Next Steps

For deployment setup, see firecrawl-deploy-integration.

Output

  • Configuration files or code changes applied to the project
  • Validation report confirming correct implementation
  • Summary of changes made and their rationale
信息
Category 编程开发
Name firecrawl-webhooks-events
版本 v20260311
大小 5.31KB
更新时间 2026-03-12
语言