POST /v1/crawl

Hosted: https://api.fastcrw.com · Self-hosted: http://localhost:3000

Auth (hosted only): Authorization: Bearer $CRW_API_KEY

Self-hosted: no auth required by default. See configuration.

Request body

{
  "url": "https://example.com",
  "limit": 100,
  "includes": ["/blog/**"],
  "excludes": ["/admin/**"],
  "maxDepth": 2,
  "scrapeOptions": {"formats": ["markdown"], "onlyMainContent": true}
}

Response

{
  "success": true,
  "data": {
    "id": "01HXYZ...",
    "url": "https://api.fastcrw.com/v1/crawl/01HXYZ..."
  }
}

GET /v1/crawl/:id — polling status

Poll the URL returned above until status === "completed".

{
  "success": true,
  "data": {
    "status": "in_progress",
    "completed": 12,
    "total": 100,
    "data": [
      {"success": true, "data": {"markdown": "...", "metadata": {"sourceURL": "https://example.com/"}}}
    ]
  }
}

status is one of scraping / in_progress, completed, or failed. Each item in data.data mirrors the scrape response shape.

Python

# Run with: python3 crawl.py
import os, time, requests

headers = {"Authorization": f"Bearer {os.environ['CRW_API_KEY']}"}
start = requests.post(
    "https://api.fastcrw.com/v1/crawl",
    headers=headers,
    json={"url": "https://example.com", "limit": 25, "maxDepth": 2},
    timeout=60,
).json()
job_id = start["data"]["id"]

while True:
    status = requests.get(
        f"https://api.fastcrw.com/v1/crawl/{job_id}",
        headers=headers, timeout=60,
    ).json()
    if status["data"]["status"] in ("completed", "failed"):
        break
    time.sleep(2)

for page in status["data"].get("data", []):
    print(page["data"]["metadata"]["sourceURL"])

Related endpoints

← CRW docs home