Cloud platform for extracting public web data. One API key, three execution modes. All endpoints sit under https://api.hasdata.com and authenticate with x-api-key.
curl -G 'https://api.hasdata.com/scrape/google/serp' \
--data-urlencode 'q=coffee' \
-H 'x-api-key: <your-api-key>'
401 invalid key, 403 quota exhausted, 429 concurrency cap, 500 server error (retry).
Use this skill when:
| Mode | Latency | When | Endpoint |
|---|---|---|---|
| Web Scraping API | seconds | Arbitrary URL — JS rendering, CSS/AI extraction, screenshots | POST /scrape/web |
| Scraper APIs (sync) | seconds | Pre-parsed JSON for known platforms (Google, Amazon, Zillow, …) | GET /scrape/<vertical>/<resource> |
| Scraper Jobs (async) | minutes–hours | Bulk extraction, recursive crawling, webhook fan-out | POST /scrapers/<slug>/jobs |
Decision rule. Default to a Scraper API when one exists for the platform (pre-parsed JSON, no selector maintenance). Use Web Scraping for arbitrary URLs not covered by an API. Reach for a Scraper Job only when no API equivalent exists — crawler, contacts, sec-edgar, amazon-bestsellers, amazon-product-reviews — or when async fan-out + webhooks save engineering time over a paginated client loop.
{ "requestMetadata": { "id": "…", "status": "ok", "url": "…" }, "...": "endpoint-specific" }
Treat data as valid only if requestMetadata.status === "ok". HTTP 200 alone isn't enough.
/scrape/google/ai-mode for the answer + references → /scrape/web (markdown) on each reference URL → cited RAG context, no vector DB./scrape/google-maps/search returns business websites and phones; collect contact details only from public, permitted sources and apply opt-out, rate, and privacy-law constraints before any outreach use.crawler Scraper Job with outputFormat: ["markdown"] + includePaths: "/docs/.+" produces an LLM-ready corpus in one submission.knowledgeGraph, localResults, inlineShoppingResults, relatedQuestions carry pre-parsed public facts. Always check them before considering direct page access.x-api-key header on every request. Read from HASDATA_API_KEY env. Never hardcode, never log.429 and 5xx only — exponential backoff, jitter. Never retry 4xx (auth, validation).429s.body.id (integer), not jobId. Persist it immediately. Poll GET /scrapers/jobs/<id> every 10–30 s with backoff; treat webhooks as best-effort and always pair with polling. On finished the status carries data: {csv, json, xlsx} short-lived URLs — download immediately.See references/code-recipes.md for ready-to-paste Python and TypeScript clients with retry, backoff, bounded concurrency, and the full job lifecycle.
jsRendering first, enable only if the page needs it — most static pages parse fine without a headless browser.cookies parameter — cookies go through headers["Cookie"].includePaths regex is case-sensitive. /blog/.+ won't match /Blog/....data is double-wrapped. Each row is body.data[i].data; outer wraps with id, jobId, dataId, createdAt, updatedAt.requestMetadata.status === "ok" is the only success signal. HTTP 200 alone isn't enough.references/web-scraping.md — POST /scrape/web parameters, JS scenarios, AI extraction, cookie auth.references/search.md — Google SERP / Light / AI Mode / News / Shopping / Bing / Trends + pagination.references/ecommerce.md — Amazon (product, search, seller, seller-products) and Shopify.references/real-estate.md — Zillow, Redfin (bracketed filters).references/travel.md — Airbnb, Booking, Google Flights (occupancy rules, token pagination, IATA codes).references/local-business.md — Maps (search/place/reviews/photos/posts), Yelp, YellowPages.references/jobs.md — Indeed and Glassdoor.references/youtube.md — YouTube search / video / channel / transcript.references/scraper-jobs.md — async submit/poll/results, Crawler, Contacts, SEC EDGAR, webhook receiver.references/code-recipes.md — Python / TypeScript clients with retry, backoff, concurrency, polling.