技能 编程开发 Apify SDK升级与迁移指南

Apify SDK升级与迁移指南

v20260423
apify-upgrade-migration
本指南详细指导开发者如何安全地升级和迁移Apify的核心组件,包括apify-client、apify和crawlee。它特别侧重于处理重大的API变更,例如从Apify SDK v2到v3的迁移,了解爬虫功能分离到'crawlee'的关键步骤。涵盖了版本检查、依赖管理和代码重构的最佳实践。
获取技能
323 次下载
概览

Apify Upgrade & Migration

Overview

Guide for upgrading apify, apify-client, and crawlee packages. The biggest migration in Apify's history was SDK v2 to v3, which split crawling functionality into the crawlee package. This skill covers that migration plus general upgrade procedures.

Prerequisites

  • Git branch for the upgrade
  • Test suite available
  • Current versions documented

Instructions

Step 1: Check Current Versions

# Check installed versions
npm list apify apify-client crawlee 2>/dev/null

# Check latest available versions
npm view apify version
npm view apify-client version
npm view crawlee version

# Check for outdated packages
npm outdated apify apify-client crawlee

Step 2: Create Upgrade Branch

git checkout -b upgrade/apify-packages

Step 3: Upgrade Packages

# Upgrade to latest
npm install apify@latest crawlee@latest apify-client@latest

# Or upgrade to specific version
npm install apify@3.2.0 crawlee@3.11.0

# Check for peer dependency issues
npm ls 2>&1 | grep "ERESOLVE\|peer dep"

Step 4: Run Tests and Fix Issues

npm test
npm run build  # Catch TypeScript errors

Major Migration: Apify SDK v2 to v3 (Crawlee Split)

This is the most common migration. In v3, crawling code moved to crawlee.

Import Changes

// ---- BEFORE (SDK v2) ----
import Apify from 'apify';
const { CheerioCrawler, PlaywrightCrawler, log } = Apify;

// ---- AFTER (SDK v3 + Crawlee) ----
import { Actor } from 'apify';
import { CheerioCrawler, PlaywrightCrawler, log } from 'crawlee';

Initialization Changes

// ---- BEFORE (v2) ----
Apify.main(async () => {
  const input = await Apify.getInput();
  const dataset = await Apify.openDataset();
  await Apify.pushData({ url: 'https://example.com' });
  await Apify.setValue('OUTPUT', { done: true });
});

// ---- AFTER (v3) ----
await Actor.main(async () => {
  const input = await Actor.getInput();
  const dataset = await Actor.openDataset();
  await Actor.pushData({ url: 'https://example.com' });
  await Actor.setValue('OUTPUT', { done: true });
});

Crawler Configuration Changes

// ---- BEFORE (v2) ----
const crawler = new Apify.CheerioCrawler({
  handlePageFunction: async ({ request, $ }) => {
    // ...
  },
  handleFailedRequestFunction: async ({ request }) => {
    // ...
  },
});

// ---- AFTER (v3 / Crawlee) ----
const crawler = new CheerioCrawler({
  requestHandler: async ({ request, $ }) => {
    // renamed from handlePageFunction
  },
  failedRequestHandler: async ({ request }, error) => {
    // renamed from handleFailedRequestFunction
    // error is now second argument
  },
});

Proxy Configuration Changes

// ---- BEFORE (v2) ----
const proxyConfiguration = await Apify.createProxyConfiguration({
  groups: ['RESIDENTIAL'],
});

// ---- AFTER (v3) ----
const proxyConfiguration = await Actor.createProxyConfiguration({
  groups: ['RESIDENTIAL'],
});

Request Queue Changes

// ---- BEFORE (v2) ----
const requestQueue = await Apify.openRequestQueue();
await requestQueue.addRequest({ url: 'https://example.com' });

// ---- AFTER (v3) ----
// Option A: Use enqueueLinks in crawler (preferred)
await enqueueLinks({ strategy: 'same-domain' });

// Option B: Open queue directly
const requestQueue = await Actor.openRequestQueue();
await requestQueue.addRequest({ url: 'https://example.com' });

Router Pattern (New in v3)

// v3 introduced explicit routers (replaces label-based if/else)
import { createCheerioRouter } from 'crawlee';

const router = createCheerioRouter();

router.addDefaultHandler(async ({ request, $, enqueueLinks }) => {
  // Handle listing pages
  await enqueueLinks({ selector: 'a.detail', label: 'DETAIL' });
});

router.addHandler('DETAIL', async ({ request, $ }) => {
  // Handle detail pages
  await Actor.pushData({ url: request.url, title: $('h1').text() });
});

const crawler = new CheerioCrawler({ requestHandler: router });

apify-client Upgrade Notes

The apify-client package has been more stable. Key changes across versions:

// v1.x → v2.x: Constructor changed
// Before
const { ApifyClient } = require('apify-client');
const client = new ApifyClient({ userId: 'xxx', token: 'yyy' });

// After (v2+): userId removed, just token
const client = new ApifyClient({ token: 'yyy' });

// Method chaining style (consistent since v2)
const run = await client.actor('username/actor').call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();

Upgrade Verification Script

// verify-upgrade.ts — run after upgrading
import { Actor } from 'apify';
import { CheerioCrawler, log } from 'crawlee';
import { ApifyClient } from 'apify-client';

async function verifyUpgrade() {
  const checks: { name: string; pass: boolean; error?: string }[] = [];

  // Check 1: Imports work
  checks.push({ name: 'Actor import', pass: typeof Actor.init === 'function' });
  checks.push({ name: 'CheerioCrawler import', pass: typeof CheerioCrawler === 'function' });
  checks.push({ name: 'ApifyClient import', pass: typeof ApifyClient === 'function' });

  // Check 2: Client connects
  try {
    const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
    const user = await client.user().get();
    checks.push({ name: 'API connection', pass: !!user.username });
  } catch (err) {
    checks.push({ name: 'API connection', pass: false, error: (err as Error).message });
  }

  // Check 3: Crawler instantiates
  try {
    const crawler = new CheerioCrawler({
      requestHandler: async () => {},
    });
    checks.push({ name: 'Crawler instantiation', pass: true });
  } catch (err) {
    checks.push({ name: 'Crawler instantiation', pass: false, error: (err as Error).message });
  }

  // Report
  console.log('\n=== Upgrade Verification ===');
  for (const check of checks) {
    const status = check.pass ? 'PASS' : 'FAIL';
    console.log(`  [${status}] ${check.name}${check.error ? ` — ${check.error}` : ''}`);
  }

  const allPassed = checks.every(c => c.pass);
  console.log(`\n${allPassed ? 'All checks passed.' : 'Some checks failed!'}`);
  process.exit(allPassed ? 0 : 1);
}

verifyUpgrade();

Rollback Procedure

# Revert to previous versions
npm install apify@3.1.0 crawlee@3.10.0 apify-client@2.9.0 --save-exact

# Or restore from lock file
git checkout main -- package-lock.json
npm ci

# On the platform: roll back Actor build
# Console > Actor > Builds > select previous build > Set as default

Error Handling

Error Cause Solution
handlePageFunction is not valid Using v2 option names in v3 Rename to requestHandler
Apify.main is not a function v2 default export removed Import { Actor } from apify
Cannot find module 'crawlee' Crawlee not installed npm install crawlee
Type errors after upgrade Changed interfaces Check release notes for type changes

Resources

Next Steps

For CI integration during upgrades, see apify-ci-integration.

信息
Category 编程开发
Name apify-upgrade-migration
版本 v20260423
大小 7.84KB
更新时间 2026-04-28
语言