MCP Server for AI Agents
CRW includes a built-in MCP (Model Context Protocol) server that gives any MCP-compatible AI assistant — Claude Code, Claude Desktop, Cursor, Windsurf, Cline, Continue.dev, OpenAI Codex CLI — 4 web scraping tools. Turn any AI coding agent into a web scraper with a single command.
Also available on the MCP Registry
Two Modes
crw-mcp supports two modes:
| Mode | When | Tools | Description |
|---|---|---|---|
| Embedded (default) | No --api-url / CRW_API_URL set |
scrape, crawl, map | Self-contained. No server needed. The scraping engine runs inside the MCP process. |
| Proxy / Cloud | --api-url / CRW_API_URL set |
scrape, crawl, map + search | Forwards tool calls to a remote CRW server. Cloud mode (fastcrw.com) adds crw_search for web search. |
Where to use what
- Use this page for the MCP model, tool list, and transport choices.
- Use MCP Client Setup for Claude Code, Codex, Cursor, Windsurf, Cline, Continue, and similar host-specific config snippets.
When MCP Helps
MCP is useful when the agent host already expects tools to be registered through a standard interface. That reduces one layer of custom glue code between the agent and your scraping service.
Typical fits:
- agentic research workflows,
- internal copilots that need current website content,
- multi-tool assistants that combine search, scrape, and synthesis,
- and developer environments such as Claude or Cursor where MCP is already the preferred integration path.
Quick Start (Embedded Mode)
No server to start, no setup. Install and add crw-mcp:
# One-line install (auto-detects OS & arch):
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | sh
# npm (zero install):
npx crw-mcp
# Python (SDK — use npx or cargo for MCP binary):
pip install crw
# Cargo:
cargo install crw-mcp
# Docker:
docker run -i ghcr.io/us/crw crw-mcp
Add to your MCP client:
# Claude Code:
claude mcp add crw -- npx crw-mcp
# OpenAI Codex CLI:
codex mcp add crw -- npx crw-mcp
That's it. The agent starts crw-mcp, which contains the full scraping engine. When the agent disconnects, the process dies.
If you want host-by-host config files instead of one-liners, jump to MCP Client Setup.
With CDP rendering (LightPanda/Chrome)
If you have a CDP-compatible browser, pass it via env vars:
claude mcp add \
-e CRW_RENDERER__LIGHTPANDA__WS_URL=ws://127.0.0.1:9222 \
crw -- npx crw-mcp
Without a CDP browser, crw-mcp uses its HTTP-only renderer (no JavaScript rendering).
Embedded mode configuration
In embedded mode, crw-mcp loads config the same way as crw-server: config.default.toml → config.local.toml → env var overrides. Env vars use CRW_ prefix with __ separator:
CRW_CRAWLER__MAX_CONCURRENCY=5
CRW_RENDERER__LIGHTPANDA__WS_URL=ws://127.0.0.1:9222
CRW_CRAWLER__USER_AGENT="MyBot/1.0"
Proxy Mode (Remote Server)
Connect to fastcrw.com or any remote CRW instance:
# Cloud server
claude mcp add \
-e CRW_API_URL=https://fastcrw.com/api \
-e CRW_API_KEY=YOUR_API_KEY \
crw -- npx crw-mcp
# Local crw-server on custom port
claude mcp add \
-e CRW_API_URL=http://localhost:4000 \
crw -- npx crw-mcp
The same CRW_API_URL + CRW_API_KEY env block also works in Codex, Claude Desktop, Cursor, Windsurf, Cline, and Continue. See MCP Client Setup for ready-to-paste config files.
Three Transport Options
| Transport | Setup | Requires |
|---|---|---|
| Stdio embedded (recommended) | claude mcp add crw -- npx crw-mcp |
Nothing |
| Stdio proxy | CRW_API_URL=... npx crw-mcp |
Remote CRW server |
| HTTP | claude mcp add --transport http crw http://localhost:3000/mcp |
crw-server running |
HTTP Transport
The crw-server has a built-in /mcp endpoint. No extra binary needed:
claude mcp add --transport http crw http://localhost:3000/mcp
CLI Options
| Flag | Env Var | Description |
|---|---|---|
--api-url |
CRW_API_URL |
Remote server URL (enables proxy mode) |
--api-key |
CRW_API_KEY |
Bearer token for remote server auth |
--config |
CRW_CONFIG |
Config file path (embedded mode only) |
| — | RUST_LOG |
Log level (default: crw_mcp=info, logs go to stderr) |
Feature Flags
| Feature | Default | Description |
|---|---|---|
embedded |
on | Self-contained scraping engine (pulls in crw-server) |
Build a slim proxy-only binary:
cargo build -p crw-mcp --no-default-features --release
Available Tools
| Tool | Description | HTTP Endpoint | Availability |
|---|---|---|---|
crw_scrape |
Scrape a URL → markdown, HTML, links | POST /v1/scrape |
All modes |
crw_crawl |
Start async crawl → returns job ID | POST /v1/crawl |
All modes |
crw_check_crawl_status |
Poll crawl status and get results | GET /v1/crawl/:id |
All modes |
crw_map |
Discover all URLs on a site | POST /v1/map |
All modes |
crw_search |
Search the web → titles, URLs, descriptions | POST /v1/search |
Cloud only |
Browser Automation (crw-browse)
crw-browse is a separate MCP server (since v0.4.0) that drives a real Chrome-family browser over CDP for stateful interaction. Use it when the agent needs to navigate multi-step flows, click, or read the DOM — cases where a one-shot scrape is not enough.
| Tool | Description |
|---|---|
goto |
Navigate the session browser to an http(s) URL and wait for load. Creates a session on first call. |
tree |
Snapshot the current page as an indented accessibility tree ([nodeId] role: name). Requires a prior goto. |
More interaction tools (click, fill_form, evaluate) are on the roadmap. Session state is automatically swept after an idle TTL.
Install and launch:
# From crates.io
cargo install crw-browse
crw-browse
# Or grab a prebuilt binary from the v0.4.0 GitHub release
# https://github.com/us/crw/releases/tag/v0.4.0
Wire it into your client the same way you do crw-mcp — it uses stdio transport and is a self-hosted binary (no cloud account required).
Best first setups
Claude Code
# Local embedded
claude mcp add crw -- npx crw-mcp
# fastcrw.com cloud
claude mcp add \
-e CRW_API_URL=https://fastcrw.com/api \
-e CRW_API_KEY=YOUR_API_KEY \
crw -- npx crw-mcp
OpenAI Codex CLI
# Local embedded
codex mcp add crw -- npx crw-mcp
For cloud mode and file-based configs, continue in MCP Client Setup.
crw_scrape
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | yes | The URL to scrape |
formats |
string[] | no | markdown, html, links |
onlyMainContent |
boolean | no | Strip nav/footer (default: true) |
includeTags |
string[] | no | CSS selectors to keep |
excludeTags |
string[] | no | CSS selectors to remove |
crw_crawl
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | yes | Starting URL |
maxDepth |
integer | no | Max crawl depth (default: 2) |
maxPages |
integer | no | Max pages (default: 10) |
crw_check_crawl_status
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
string | yes | Job ID from crw_crawl |
crw_map
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | yes | URL to map |
maxDepth |
integer | no | Discovery depth (default: 2) |
useSitemap |
boolean | no | Read sitemap.xml (default: true) |
crw_search (cloud only)
Available only when connected to fastcrw.com via CRW_API_URL. Not available in embedded or self-hosted mode.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | yes | The search query |
limit |
integer | no | Max results (default: 5) |
lang |
string | no | Language code (e.g. "en", "tr") |
country |
string | no | Country code (e.g. "us", "tr") |
scrapeOptions |
object | no | Scrape each result page (e.g. {"formats": ["markdown"]}) |
Example Agent Tool Flow
A clean MCP setup often assigns each CRW route a narrow purpose:
searchfor web discovery when you don't know the URL (cloud only),mapfor site-specific URL discovery,scrapefor single-page extraction,crawlfor bounded recursive work.
That keeps tool selection obvious for the host agent. If you expose one broad "web tool" instead, agents tend to overuse it and produce noisier traces.
A common workflow:
- The agent identifies a site or page it needs.
- It calls an MCP-exposed CRW tool.
- CRW returns scrape, map, or crawl output.
- The agent decides whether to continue exploring or move into summarization, ranking, or retrieval.
When MCP Is Better Than Direct HTTP
Choose MCP when the host environment already expects tool discovery through a shared protocol, especially in local agent runtimes or IDE workflows. Choose direct HTTP when your application already owns orchestration and just needs API access from the backend.
In other words, MCP is ideal when the caller is an agent platform. Direct HTTP is often simpler when the caller is your own service code.
Detailed host setup
Use MCP Client Setup for:
- Claude Code one-liners and HTTP transport
- Codex CLI one-liners and
~/.codex/config.toml - Claude Desktop config paths
- Cursor, Windsurf, and Cline JSON examples
- Continue YAML examples
- local embedded and fastcrw.com cloud config variants side by side
Verify Installation
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"clientInfo":{"name":"test"},"protocolVersion":"2024-11-05"}}' \
| crw-mcp 2>/dev/null
Expected:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"protocolVersion": "2024-11-05",
"capabilities": {"tools": {}},
"serverInfo": {"name": "crw-mcp", "version": "<current>"}
}
}
How It Works
Embedded mode (default):
AI Assistant → stdin (JSON-RPC 2.0) → crw-mcp [scraping engine] → Web pages
Proxy mode:
AI Assistant → stdin (JSON-RPC 2.0) → crw-mcp → HTTP → crw-server → Web pages
HTTP transport:
AI Assistant → HTTP POST (JSON-RPC 2.0) → crw-server /mcp → Web pages
In embedded mode, the scraping engine runs in-process with zero overhead. In proxy mode, tool calls are forwarded over HTTP. The HTTP transport calls crw-server functions directly.
Protocol version: 2024-11-05
Operational Notes
- Keep MCP tool descriptions tight so the agent knows when to use
mapversusscrape. - Start with read-only scraping tools before exposing anything more complex in the same MCP server.
- Log tool usage separately from downstream agent reasoning so debugging stays tractable.
Common Mistakes
- Registering ambiguous tool descriptions that do not explain when to use
mapversusscrape. - Mixing operational secrets and agent prompts in the same configuration surface.
- Assuming MCP replaces deployment or auth decisions; it only standardizes the tool interface.