CRW — open-source web scraper & crawler (Firecrawl-compatible)
Docs at docs.fastcrw.com. Hosted API at
https://api.fastcrw.com. Self-hosted at
http://localhost:3000.
Five endpoints: /v1/search, /v1/scrape,
/v1/crawl, /v1/map, /v1/extract.
Quick: copy this and run
Auth (hosted only):
Authorization: Bearer $CRW_API_KEY.
Self-hosted (http://localhost:3000) needs no header.
Body:
{ "query": string, "limit": number (default 5, max 20), "lang"?: "en", "tbs"?: "qdr:d|w|m|y", "sources"?: ["web"|"news"|"images"] }
Python (stdlib only — defaults to hosted, fails loudly if key is missing)
#!/usr/bin/env python3
"""Search CRW and print top results as JSON."""
import json, os, sys, urllib.request
API_KEY = os.environ.get("CRW_API_KEY")
# Default to the hosted API. Self-hosting? Set CRW_API_URL=http://localhost:3000
BASE = os.environ.get("CRW_API_URL", "https://api.fastcrw.com")
URL = f"{BASE}/v1/search"
headers = {"Content-Type": "application/json"}
if API_KEY:
headers["Authorization"] = f"Bearer {API_KEY}"
elif BASE.startswith("https://api.fastcrw.com"):
sys.exit("error: CRW_API_KEY is required for the hosted API. "
"Get one at https://fastcrw.com (500 free credits) "
"or set CRW_API_URL=http://localhost:3000 to self-host.")
req = urllib.request.Request(
URL,
data=json.dumps({"query": "renewable energy trends 2024", "limit": 3}).encode(),
headers=headers,
method="POST",
)
with urllib.request.urlopen(req) as r:
body = json.loads(r.read()) # body == {"success": true, "data": [...]}
# Results live under body["data"]. Each row has `title`, `url`, `snippet`,
# `description`, `position`, `score`, `category` — print the LLM-ready subset.
print(json.dumps(
[{"title": x["title"], "url": x["url"], "snippet": x["snippet"]}
for x in body["data"]],
indent=2,
))
# Run with: python3 search.py (file is python3-only — no `python` alias needed)
cURL
curl -X POST https://api.fastcrw.com/v1/search \
-H "Authorization: Bearer $CRW_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query":"renewable energy trends 2024","limit":3}'
CLI (one-shot LLM-ready output)
The crw CLI projects directly to the title/url/snippet
schema with --json --fields — no jq reshape
needed:
# Install: brew install us/crw/crw (or: cargo install crw-cli)
crw search "renewable energy trends 2024" --json --fields title,url,snippet --limit 3
# Available fields: title, url, description, snippet, position, score, category
# --json is shorthand for --format json (also accepts --format text|markdown)
# Self-hosted: no auth. Hosted: export CRW_API_KEY=crw_live_...
Response shape
{
"success": true,
"data": [
{"url": "https://...", "title": "...",
"description": "...", "snippet": "...",
"position": 1, "score": 9.5}
]
}
Other endpoints
POST /v1/scrape {"url":"...","formats":["markdown"]} — one URL → markdown
POST /v1/map {"url":"...","limit":50} — discover URLs
POST /v1/crawl {"url":"...","maxPages":5} — async multi-page; poll GET /v1/crawl/{id}
POST /v1/scrape with formats:["json"] + jsonOptions.schema — structured extract
Install (local setup only)
pip install crw ·
npx crw-mcp ·
brew install us/crw/crw ·
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | sh ·
docker run -p 3000:3000 ghcr.io/us/crw:latest
Resources