More API

Crawl

Recursively crawl a site when one page is not enough. CRW crawl is asynchronous by design: start a job, poll it, and widen scope only after the first batch looks correct.

Best for: many pages from one site

Model: async start and poll

Start with: a very small crawl

Use Map First View Scrape

Try it in the Playground

Validate scope before you scale up

Use maxPages: 5 and maxDepth: 1 first. If the returned batch is wrong, a larger crawl only makes the mistake more expensive.

Open Playground Open Quick Start

Crawling a site with CRW

/v1/crawl

POST /v1/crawl
GET /v1/crawl/{id}
DELETE /v1/crawl/{id}

Authentication:

Hosted: send Authorization: Bearer YOUR_API_KEY
Self-hosted: only required when auth.api_keys is configured

Installation

CRW crawl is also plain HTTP. You start the job with one request and check its status with another.

Basic usage

Start with this request:

{
  "url": "https://docs.example.com",
  "maxDepth": 1,
  "maxPages": 5,
  "formats": ["markdown"],
  "onlyMainContent": true
}

import requests
import time

start = requests.post(
    "https://api.fastcrw.com/v1/crawl",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "url": "https://docs.example.com",
        "maxDepth": 1,
        "maxPages": 5,
        "formats": ["markdown"],
    },
)

crawl_id = start.json()["id"]
time.sleep(2)
status = requests.get(
    f"https://api.fastcrw.com/v1/crawl/{crawl_id}",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
)

print(status.json()["status"])

const start = await fetch("https://api.fastcrw.com/v1/crawl", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://docs.example.com",
    maxDepth: 1,
    maxPages: 5,
    formats: ["markdown"]
  })
});

const { id } = await start.json();
await new Promise((resolve) => setTimeout(resolve, 2000));

const status = await fetch(`https://api.fastcrw.com/v1/crawl/${id}`, {
  headers: { "Authorization": "Bearer YOUR_API_KEY" }
});

console.log((await status.json()).status);

curl -X POST https://api.fastcrw.com/v1/crawl \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "maxDepth": 1,
    "maxPages": 5,
    "formats": ["markdown"]
  }'

Response

Start response:

{
  "success": true,
  "id": "550e8400-e29b-41d4-a716-446655440000"
}

Poll response:

{
  "success": true,
  "status": "scraping",
  "total": 5,
  "completed": 2,
  "data": []
}

Parameters

Field	Type	Default	Description
`url`	string	required	Starting URL
`maxDepth`	number	`2`	Maximum depth from the start URL
`maxPages`	number	`100`	Maximum number of pages to crawl
`limit`	number	alias	Firecrawl-compatible alias for `maxPages`
`max_pages`	number	alias	Snake_case alias for `maxPages`
`formats`	string[]	`["markdown"]`	Output formats for each page
`onlyMainContent`	boolean	`true`	Remove boilerplate content before conversion
`jsonSchema`	object	--	Optional schema for structured extraction per page
`renderJs`	boolean or null	`null`	`true` forces JS on every page, `false` skips JS, `null` uses auto-detect or the server's `render_js_default`
`waitFor`	number	--	Milliseconds to wait after JS rendering on each page
`renderer`	string	`auto`	Pin every crawled page to a specific renderer: `auto`, `lightpanda`, `chrome`, `chrome_proxy`, `playwright`, or `camoufox`. Non-`auto` values hard-pin (no fallback) and imply `renderJs:true` unless `renderJs:false` is set. Validation runs once at crawl start — invalid combinations return HTTP 400 before the job is queued. Per-page failures of a pinned renderer are logged and skipped, so failed pages may be missing from results — see JS rendering for the resilience tradeoff
`country`	string	--	2-letter ISO 3166-1 alpha-2 country code (lowercase, e.g. `us`, `gb`, `de`). Routes every page in the crawl through the named residential pool when the `chrome_proxy` renderer tier is configured. Ignored if no proxy tier is set up

Scrape options and extraction

Crawl inherits the same content-format logic as scrape:

Start with formats: ["markdown"]
Add extraction only after the first crawl batch looks correct
Keep onlyMainContent: true unless you explicitly need full-page noise

If you need to debug one problematic page, go back to Scrape and validate that page in isolation first.

Checking job status

Poll the crawl ID until you reach completed or failed:

curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://api.fastcrw.com/v1/crawl/CRAWL_ID

Status response shape:

{
  "success": true,
  "status": "scraping | completed | failed",
  "total": 12,
  "completed": 12,
  "data": [
    {
      "markdown": "# Page content",
      "metadata": {
        "sourceURL": "https://example.com/page"
      }
    }
  ],
  "error": "optional error"
}

Cancellation and limits

Cancel a running job with:

DELETE /v1/crawl/{id}

CRW crawl stays within the same origin and should be treated as a bounded, respectful site job, not an open-ended spider.

Common production patterns

Run Map first when you are unsure about the reachable section.
Keep maxPages very low on first contact with a new site.
Poll with backoff instead of hammering the same crawl ID.
Use extraction only after the markdown output of the first crawl batch looks correct.

Common mistakes

Starting with maxPages: 500 before validating the target
Treating crawl like a synchronous route
Assuming crawl crosses origins
Ignoring robots.txt and target-side rate behavior

When to use something else

Use Map when you need URL discovery before crawling
Use Scrape when you only need one known page
Use Search when you do not even know the site or URL set yet

CRW — open-source web scraper & crawler (Firecrawl-compatible)

Quick: copy this and run

Python (stdlib only — defaults to hosted, fails loudly if key is missing)

cURL

CLI (one-shot LLM-ready output)

Response shape

Other endpoints

Install (local setup only)

Resources