Troubleshooting / FAQ
Eight common failure patterns with verified causes and fixes — from empty markdown to MCP server visibility issues.
Quick diagnosis checklist
Before diving into individual patterns:
- Check
successin the response body —falsemeans a hard failure. - Check
error_code(snake_case string) for the machine-readable cause. - Check
data.warnings[]whensuccess: true— soft issues like truncation, unsupported formats, and renderer fallbacks surface there. Anti-bot blocks are not indata.warnings[]; they always returnsuccess: false+error_code: "anti_bot"(see pattern 2). - Check
metadata.statusCode— this is the target site's HTTP status, not fastCRW's. - Check the fastCRW HTTP status separately —
401/422/429all mean different things.
1. Empty or minimal markdown
Symptom: Response is success: true but data.markdown is an empty string or fewer than 100 characters. The page renders fine in a browser.
Cause: The page is a JavaScript Single-Page Application (SPA). The HTTP-only fetcher receives the shell HTML before React/Vue/Next.js mounts content into the DOM. No JavaScript executes, so readability sees an empty container.
Fix: Set renderJs: true to force a headless browser render:
# cURL
curl -X POST https://api.fastcrw.com/v1/scrape \
-H "Authorization: Bearer $CRW_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://app.example.com/dashboard",
"renderJs": true,
"waitFor": 1500
}'
# Python — pip install crw
from crw import CrwClient
client = CrwClient(api_key="crw_live_YOUR_KEY")
result = client.scrape(
"https://app.example.com/dashboard",
render_js=True,
wait_for=1500,
)
print(result["markdown"][:300])
Self-hosted: renderJs: true requires a configured CDP renderer (LightPanda or Chrome). If neither is running you will get a warning "JS rendering was requested but no renderer is available" and the engine falls back to HTTP. Check GET /health and GET /metrics/renderer-breakers.
wait_for guidance: Start with 1000–2000 ms for most SPAs. Complex hydration (Next.js App Router, React 18 concurrent mode) may need 3000–5000 ms.
2. Anti-bot block / Cloudflare challenge page
Symptom: Response contains success: false, error_code: "anti_bot", and the error message names the protection class (e.g., "Blocked by anti-bot (cloudflare): ..." or "Blocked by anti-bot (datadome): ...").
Cause: The target site fingerprints the request as non-browser traffic. Common triggers: missing User-Agent, no Sec-Fetch-* headers, a datacenter IP, or a detectable headless browser.
Fix (staged):
- Add
"stealth": trueto route through randomized headers and a pooled real-browser UA. - Switch renderer to
"chrome_proxy"to egress through residential IPs (fastcrw.com managed tier only):
# cURL — step 1: stealth headers
curl -X POST https://api.fastcrw.com/v1/scrape \
-H "Authorization: Bearer $CRW_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.example-protected.com/product/123",
"renderJs": true,
"stealth": true
}'
# cURL — step 2: residential proxy Chrome tier
curl -X POST https://api.fastcrw.com/v1/scrape \
-H "Authorization: Bearer $CRW_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.example-protected.com/product/123",
"renderer": "chrome_proxy",
"stealth": true,
"country": "us"
}'
# Python — residential proxy tier
from crw import CrwClient
client = CrwClient(api_key="crw_live_YOUR_KEY")
result = client.scrape(
"https://www.example-protected.com/product/123",
renderer="chrome_proxy",
stealth=True,
country="us",
)
print(result["markdown"][:300])
Note: renderer: "chrome_proxy" is only available on fastcrw.com managed infrastructure. Self-hosted deployments must supply their own residential proxy pool via proxy or proxy_list and use renderer: "chrome".
Do not retry anti-bot blocks in a tight loop — each failed attempt teaches the WAF more about your traffic pattern.
3. HTTP 422 target_unreachable
Symptom: API returns HTTP 422 with error_code: "target_unreachable" and an error like "Target unreachable: dns error: failed to lookup address" or "connection refused".
Cause: The fastCRW engine could not establish a TCP connection to the target host. Most common reasons:
- The hostname does not resolve (typo, private DNS, or the domain is down).
- The host refuses connections on port 80/443 (firewall, no HTTP server).
- The URL targets a private/reserved IP that the URL safety validator blocks (e.g.,
localhost,10.x.x.x,192.168.x.x).
Fix:
# Verify DNS resolves before calling the API
nslookup your-target.example.com
# Check the URL is reachable from outside your network
curl -I https://your-target.example.com
# Check the exact error message in the response body
curl -X POST https://api.fastcrw.com/v1/scrape \
-H "Authorization: Bearer $CRW_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://your-target.example.com"}' | jq '.error'
Error code mapping (from crw-server/src/error.rs):
CrwError::TargetUnreachable → HTTP 422, error_code: "target_unreachable"
CrwError::ExtractionError → HTTP 422, error_code: "extraction_error"
If the target URL is correct and externally reachable, the issue is intermittent — retry with backoff. If it is consistent, the target host has a problem unrelated to fastCRW.
4. HTTP 401 Invalid API key
Symptom: Every request returns HTTP 401 with {"success": false, "error": "Invalid API key"} or {"success": false, "error": "Missing Authorization header"}.
Cause: The Authorization header is missing, uses the wrong format, or the key value does not match any configured key.
The server checks (from crw-server/src/middleware.rs):
- The header must be present and start with exactly
"Bearer "(capital B, space after). - The token after
Bearermust match a key in[auth].api_keys(constant-time comparison). - If
[auth].api_keysis empty, all requests pass — no auth required.
Fix:
# Correct header format
curl -X POST https://api.fastcrw.com/v1/scrape \
-H "Authorization: Bearer crw_live_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Common mistakes that return 401:
# -H "Authorization: crw_live_YOUR_KEY" ← missing "Bearer "
# -H "Authorization: bearer crw_live_YOUR_KEY" ← lowercase "bearer"
# -H "Authorization: Basic ..." ← wrong scheme
from crw import CrwClient
# Key is passed in the constructor — the SDK sets the header correctly
client = CrwClient(api_key="crw_live_YOUR_KEY")
Self-hosted: If you did not set [auth].api_keys in your config, the API is open to all. If you set keys, pass one of them verbatim as the Bearer token.
5. HTTP 429 — rate limit vs. credit exhausted
Symptom: Requests return HTTP 429. But two completely different problems share this status code.
How to tell them apart:
| Condition | Body error contains |
Has Retry-After header? |
Fix |
|---|---|---|---|
| RPM rate limit | "Rate limit exceeded" or "Rate limited" |
Yes (fastcrw.com SaaS load balancer); absent on self-hosted | Respect Retry-After; use exponential backoff on self-hosted |
| Credit exhausted | "Insufficient credits" |
No | Top up or enable auto-recharge |
Rate limit (RPM):
# Honor the Retry-After header (fastcrw.com cloud)
RETRY_AFTER=$(curl -sI -X POST https://api.fastcrw.com/v1/scrape \
-H "Authorization: Bearer $CRW_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}' | grep -i retry-after | awk '{print $2}')
echo "Retry after: $RETRY_AFTER seconds"
import time, requests
def scrape_with_retry(url, api_key, max_retries=5):
headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
for attempt in range(max_retries):
resp = requests.post(
"https://api.fastcrw.com/v1/scrape",
headers=headers,
json={"url": url},
)
if resp.status_code == 429:
body = resp.json()
if "Insufficient credits" in body.get("error", ""):
raise RuntimeError("Credits exhausted — top up at fastcrw.com/dashboard")
# RPM rate limit — back off
wait = int(resp.headers.get("Retry-After", 2 ** attempt))
time.sleep(wait)
continue
resp.raise_for_status()
return resp.json()
raise RuntimeError("Max retries exceeded")
result = scrape_with_retry("https://example.com", "crw_live_YOUR_KEY")
Credit exhausted: Retrying on a timer does not help — the balance does not replenish on its own. Go to fastcrw.com/dashboard to top up or enable auto-recharge.
Self-hosted: 429 from a self-hosted instance is always the RPM rate limiter (rate_limit_rps in config). There is no credit system on self-hosted. No Retry-After header is emitted; use client-side exponential backoff.
6. Search returns empty results
Symptom: POST /v1/search returns HTTP 503 with error_code: "search_disabled", or it returns success: true but data.results is an empty array [].
Two separate problems:
6a. error_code: "search_disabled" (503)
Cause: The engine has no SearXNG instance configured. The search route requires [search].searxng_url in config or the CRW_SEARCH__SEARXNG_URL environment variable.
Fix (self-hosted):
# config.toml
[search]
searxng_url = "http://localhost:8080"
# Or via environment variable
export CRW_SEARCH__SEARXNG_URL=http://localhost:8080
# Quickest self-hosted setup — use the bundled docker-compose
docker compose up -d
# The bundled docker-compose.yml already wires CRW to the searxng sidecar
fastcrw.com cloud: search_disabled never occurs on the managed SaaS tier. crw_search is always available.
6b. Empty results array (no error)
Cause: SearXNG returned zero results — usually because no search engines are enabled in the SearXNG instance, the instance is rate-limited by upstream search providers, or the query matched no results.
Diagnosis:
# Hit the SearXNG UI directly to check it is working
curl "http://localhost:8080/?q=test+query&format=json" | jq '.results | length'
# Check which engines SearXNG has enabled
curl "http://localhost:8080/config" | jq '.engines[] | select(.enabled == true) | .name'
Fix: Enable engines in the SearXNG settings.yml, or use the bundled Docker image which comes pre-configured.
7. MCP server not appearing in AI client
Symptom: After adding the MCP config, the tool list in Claude Code / Claude Desktop / Cursor / Windsurf does not show crw_scrape, crw_crawl, or the other CRW tools.
Common causes and fixes:
7a. The client was not restarted
MCP servers are registered at client start. Add the config entry then fully restart the application (not just reload the window).
7b. npx is not in PATH when the client launches
GUI applications on macOS may not inherit your shell's PATH. The binary is not found and the MCP process silently fails to start.
// Workaround: use the absolute path to npx
{
"mcpServers": {
"crw": {
"command": "/usr/local/bin/npx",
"args": ["-y", "crw-mcp"]
}
}
}
Find your npx path with which npx in a terminal.
7c. Wrong config file location
| Client | Config file |
|---|---|
| Claude Code | claude mcp add crw -- npx -y crw-mcp (CLI manages the file) or ~/.claude/mcp.json |
| Claude Desktop | ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) |
| Cursor | .cursor/mcp.json in project root, or ~/.cursor/mcp.json (global) |
| Windsurf | ~/.codeium/windsurf/mcp_config.json |
7d. crw_search is missing but other tools appear (embedded mode)
crw_search is only advertised when a SearXNG backend is configured. In embedded mode with no SearXNG, the tool is intentionally hidden. To get crw_search, either:
- Switch to fastcrw.com cloud mode (
CRW_API_URL+CRW_API_KEY), or - Run a local SearXNG and set
CRW_SEARCH__SEARXNG_URL.
7e. Diagnosing with the JSON-RPC wire
The MCP server communicates over stdio JSON-RPC. You can test it directly:
# List available tools (paste and press Enter, then Ctrl-D)
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | npx -y crw-mcp 2>/dev/null
If this produces a JSON response with a tools array, the binary works and the problem is in the client config.
8. Credit exhausted — 402 vs 429
Symptom: The fastcrw.com SaaS returns either HTTP 402 or HTTP 429 related to credits or billing.
How the SaaS uses these codes:
| HTTP status | When | Body error |
|---|---|---|
429 |
Credits balance is zero when the request arrives | "Insufficient credits" |
402 |
Payment required (e.g., free tier cap reached, plan expired) | Various billing messages |
Note for self-hosted users: Neither 402 nor credit-429 exists on self-hosted. The only 429 on self-hosted is the RPM rate limiter (rate_limited error code). Self-hosted has no billing system.
Fix:
# Check your balance (fastcrw.com SaaS)
curl https://fastcrw.com/api/v1/account/balance \
-H "Authorization: Bearer $CRW_API_KEY" | jq '{balance, plan}'
# Python — check balance before a large crawl
import requests
resp = requests.get(
"https://fastcrw.com/api/v1/account/balance",
headers={"Authorization": f"Bearer {api_key}"},
)
data = resp.json()
print(f"Balance: {data['balance']} credits")
if data["balance"] < 100:
print("Warning: low balance — top up at fastcrw.com/dashboard")
Do not retry a credit-429 on a timer — the balance is exhausted and retrying consumes no credits (the request is rejected before any work starts) but it does waste time and may look like abuse. Replenish the balance first.
Error code reference
error_code |
HTTP | Source |
|---|---|---|
invalid_request |
400 | Bad JSON, bad URL, missing required field |
invalid_url |
— | Reserved — not emitted in practice; invalid URLs are returned as invalid_request (HTTP 400) by all server routes |
target_unreachable |
422 | DNS failure, connection refused, host down |
extraction_error |
422 | LLM extraction failed or CSS/XPath selector invalid |
http_error |
502 | Network-level error reaching the target |
timeout |
504 | Engine or upstream search timed out |
rate_limited |
429 | RPM rate limit (engine-level) |
search_disabled |
503 | /v1/search called with no SearXNG configured |
not_found |
404 | Unknown endpoint or crawl job ID does not exist |
anti_bot |
4xx/5xx | Anti-bot interstitial detected — always success: false; detect via error_code: "anti_bot" |
renderer_error |
500 | CDP browser internal error |
internal_error |
500 | Unexpected engine failure |
Source: crw-core/src/error.rs (error_code() method), crw-server/src/error.rs (HTTP status mapping), and crw-server/src/routes/scrape.rs (anti-bot detection + anti_bot code).