Self-Hosting
Use self-hosting when you want CRW running inside your own infrastructure. The hosted API is still the fastest first-run path; self-hosting is the better fit when network control, auth, and deployment ownership matter more than zero-setup speed.
Quick Start
The shortest self-hosted path is Docker:
docker run -p 3000:3000 ghcr.io/us/crw
Then make a local request:
curl -X POST http://localhost:3000/v1/scrape \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","formats":["markdown"]}'
What You Get
- the same core self-hosted routes:
scrape,crawl,map,mcp,health - optional auth with Bearer tokens
- optional browser-backed rendering
- your own reverse proxy, logging, rate limits, and deployment choices
Recommended Rollout
- start with
scrapeon a real target - validate one
maprequest - run a bounded
crawlwith a low page cap - add auth, TLS, and edge controls before broader access
- add JS rendering only when targets actually need it
When Self-Hosting Is The Right Choice
Choose self-hosting when you want:
- private network placement,
- direct control over operational cost,
- custom auth and routing,
- or a local MCP/server flow without depending on the hosted product.
When The Hosted API Is Better
Choose the hosted path when:
- you want your first successful request immediately,
- you do not want to manage renderer dependencies,
- or your priority is product velocity rather than infrastructure ownership.
Common Mistakes
- Starting with browser rendering before validating plain HTTP scraping
- Exposing the service publicly before adding auth, TLS, and rate limits
- Treating one happy-path scrape as enough deployment validation