Skills Development Apify Client: Running Actors and Scraping Data

Apify Client: Running Actors and Scraping Data

v20260423
apify-hello-world
A comprehensive guide to using the Apify Client SDK. Learn the fundamental pattern of executing remote web scraping Actors (like content crawling or general scraping), managing asynchronous run states, and retrieving structured data from resulting datasets and key-value stores. Essential for integrating powerful web scraping capabilities into applications.
Get Skill
143 downloads
Overview

Apify Hello World

Overview

Run a public Actor from the Apify Store, wait for it to finish, and retrieve the scraped data. This demonstrates the fundamental call-wait-collect pattern used in every Apify integration.

Prerequisites

  • npm install apify-client completed
  • APIFY_TOKEN environment variable set
  • See apify-install-auth if not ready

Core Pattern: Call Actor, Get Data

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });

// 1. Run an Actor and wait for it to finish
const run = await client.actor('apify/website-content-crawler').call({
  startUrls: [{ url: 'https://docs.apify.com/academy' }],
  maxCrawlPages: 5,
});

// 2. Retrieve results from the default dataset
const { items } = await client.dataset(run.defaultDatasetId).listItems();

console.log(`Crawled ${items.length} pages:`);
items.forEach(item => {
  console.log(`  - ${item.url}: ${item.text?.substring(0, 80)}...`);
});

Instructions

Step 1: Create the Script

Create hello-apify.ts (or .js) with the code above.

Step 2: Run It

# With tsx (recommended)
npx tsx hello-apify.ts

# Or with Node.js (plain JS)
node hello-apify.js

Step 3: Understand the Output

The Actor runs on Apify's cloud infrastructure. When it finishes:

  • run.id — unique run identifier
  • run.statusSUCCEEDED, FAILED, TIMED-OUT, or ABORTED
  • run.defaultDatasetId — ID of the dataset containing results
  • run.defaultKeyValueStoreId — ID of the KV store with metadata

Popular Starter Actors

Actor ID Purpose Typical Input
apify/website-content-crawler Crawl and extract text { startUrls, maxCrawlPages }
apify/web-scraper General-purpose scraper { startUrls, pageFunction }
apify/cheerio-scraper Fast HTML scraper { startUrls, pageFunction }
apify/google-search-scraper Google SERP results { queries, maxPagesPerQuery }

Synchronous vs Asynchronous Runs

// SYNCHRONOUS — .call() waits for the Actor to finish (simple, blocking)
const run = await client.actor('apify/web-scraper').call(input);

// ASYNCHRONOUS — .start() returns immediately, poll later
const run = await client.actor('apify/web-scraper').start(input);
// ... do other work ...
const finishedRun = await client.run(run.id).waitForFinish();

Working with Results

// Get all items (paginated internally)
const { items } = await client.dataset(run.defaultDatasetId).listItems();

// Get items with pagination control
const page1 = await client.dataset(run.defaultDatasetId).listItems({
  limit: 100,
  offset: 0,
});

// Download entire dataset as CSV/JSON/etc.
const buffer = await client.dataset(run.defaultDatasetId).downloadItems('csv');

// Get a named output from the key-value store
const screenshot = await client
  .keyValueStore(run.defaultKeyValueStoreId)
  .getRecord('screenshot');

Run Configuration Options

const run = await client.actor('apify/web-scraper').call(
  input,       // Actor-specific input object
  {
    memory: 1024,          // Memory in MB (128–32768, powers of 2)
    timeout: 300,          // Timeout in seconds (default: Actor's setting)
    build: 'latest',       // Which build to use
    waitSecs: 120,         // Max wait for .call() (0 = don't wait)
  }
);

Error Handling

Error Cause Solution
Actor not found Wrong Actor ID Check ID at apify.com/store
run.status === 'FAILED' Actor crashed Check run.statusMessage for details
run.status === 'TIMED-OUT' Exceeded timeout Increase timeout or reduce workload
Dataset is empty Actor produced no output Verify input parameters; check Actor logs
402 Payment Required Insufficient compute units Top up at console.apify.com/billing

Complete Example: Scrape and Save

import { ApifyClient } from 'apify-client';
import { writeFileSync } from 'fs';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });

async function scrapeAndSave() {
  console.log('Starting Actor run...');

  const run = await client.actor('apify/website-content-crawler').call({
    startUrls: [{ url: 'https://example.com' }],
    maxCrawlPages: 10,
  });

  if (run.status !== 'SUCCEEDED') {
    throw new Error(`Actor run failed: ${run.status} — ${run.statusMessage}`);
  }

  const { items } = await client.dataset(run.defaultDatasetId).listItems();
  writeFileSync('results.json', JSON.stringify(items, null, 2));
  console.log(`Saved ${items.length} items to results.json`);
}

scrapeAndSave().catch(console.error);

Resources

Next Steps

Proceed to apify-local-dev-loop for local Actor development.

Info
Category Development
Name apify-hello-world
Version v20260423
Size 5.53KB
Updated At 2026-04-28
Language