Hosted: https://api.fastcrw.com · Self-hosted: http://localhost:3000
Auth (hosted only): Authorization: Bearer $CRW_API_KEY
Self-hosted: no auth required by default. See configuration.
{
"url": "https://example.com",
"limit": 100,
"includes": ["/blog/**"],
"excludes": ["/admin/**"],
"maxDepth": 2,
"scrapeOptions": {"formats": ["markdown"], "onlyMainContent": true}
}
{
"success": true,
"data": {
"id": "01HXYZ...",
"url": "https://api.fastcrw.com/v1/crawl/01HXYZ..."
}
}
Poll the URL returned above until status === "completed".
{
"success": true,
"data": {
"status": "in_progress",
"completed": 12,
"total": 100,
"data": [
{"success": true, "data": {"markdown": "...", "metadata": {"sourceURL": "https://example.com/"}}}
]
}
}
status is one of scraping / in_progress,
completed, or failed. Each item in data.data
mirrors the scrape response shape.
# Run with: python3 crawl.py
import os, time, requests
headers = {"Authorization": f"Bearer {os.environ['CRW_API_KEY']}"}
start = requests.post(
"https://api.fastcrw.com/v1/crawl",
headers=headers,
json={"url": "https://example.com", "limit": 25, "maxDepth": 2},
timeout=60,
).json()
job_id = start["data"]["id"]
while True:
status = requests.get(
f"https://api.fastcrw.com/v1/crawl/{job_id}",
headers=headers, timeout=60,
).json()
if status["data"]["status"] in ("completed", "failed"):
break
time.sleep(2)
for page in status["data"].get("data", []):
print(page["data"]["metadata"]["sourceURL"])