Changelog

This page is generated from the root CHANGELOG.md, which is maintained by release-please during releases.

The source of truth is the repository root changelog. Do not edit this docs page manually.

All notable changes to CRW are documented here.

0.22.0 (2026-07-09)

Features

cli: non-interactive cloud setup via 'crw setup --api-key' (e7cc4f9)
installer: connect to cloud when CRW_API_KEY is set (fedb5de)
server: batch scrape scale guards for 10k-URL submits (9a3e0b6)

Bug Fixes

clippy: satisfy clippy 1.97 lints (for_kv_map, manual_filter) (ed4522d)
mcp: launch server via 'npx -y crw-mcp' so fresh installs work (a574a61)
server: cancelled crawl/batch jobs reach a terminal state (fee388c)

Performance

docker: cargo-chef dependency layer for faster engine builds (603acfa)
server: raise batch URL validation concurrency to 256 (651d3c0)

0.21.3 (2026-07-08)

Bug Fixes

ci: let release doc-sync pushes bypass branch protection (7510d1e)

0.21.2 (2026-07-08)

Bug Fixes

renderer: load images/media/fonts for screenshots (2310e5c)
renderer: resolve CDP ws host to IP for Chromium 148+ rebinding guard (b197b95)

0.21.1 (2026-07-07)

Bug Fixes

scrape: detect modern Cloudflare Turnstile 200 interstitial (ce3771f), closes #350
scrape: scan full html for CF challenge markers, not an 80KB prefix (99fe2b2), closes #350

0.21.0 (2026-07-06)

⚠ BREAKING CHANGES

api: error responses now use errorCode instead of error_code. The managed API already strips the field, so managed clients are unaffected; self-hosted clients parsing error_code should read errorCode.

Features

cli: add crw bench FRAMES harness (fb78bcb)
cli: bench A/B flags, concurrency, crash-safe writes (7eb059e)
core: add evidence & provenance primitives (4a638a0)
core: expose sourceHash on scrape responses (b0d89f9)
extract: prompt-based extraction and full meta-tag metadata (6453204)
proxy: retry with default country on residential CONNECT tunnel failure (8aafdc8)
search: thread per-request country to result-page scraping (be3b0d1)
server: add /firecrawl/* compat namespace, own /v1 as native API (d9cb3b8)

Bug Fixes

api: rename error_code response field to errorCode (e4df682)
cli: repair clap arg conflict that panicked debug builds (f2a6b77)
docs: add redirect stubs for pre-flatten legacy doc URLs (a1f29dc)
extract: unify untrusted-content fencing; nonce-fence the change judge (192b9d7)
renderer: cap full-page screenshot height to avoid OOM (c5f555d), closes #161
renderer: don't browser-render thin pages that ship no JS (f42f14a)
scrape: report anti-bot/challenge pages as blocked, not success (d06f051)
search: charset-aware scrape decode + per-result error + neutral answer warnings (a9870e2)

0.20.0 (2026-07-01)

Features

map: escalate anti-bot-gated sitemaps through the JS renderer (2634a7e)
map: fix URL discovery on hard sites + configurable discovery limit (f5fe1f4)

Bug Fixes

renderer: flat 1-credit cost for every renderer (35a604c)

0.19.0 (2026-06-28)

Features

renderer: proxy-retry on origin rate-limit (429) (da7ecda)
renderer: relaxed-TLS fallback for cert-broken origins (13d39f6)

Performance

engine: offload HTML extraction off the async reactor (1683153)

0.18.3 (2026-06-23)

Bug Fixes

renderer: strip "Mozilla" UA prefix for lightpanda (it rejects it) (fda6139)

0.18.2 (2026-06-23)

Bug Fixes

renderer: send a modern UA on the CDP path (setUserAgentOverride) (ffcf564)

0.18.1 (2026-06-23)

Bug Fixes

npm: ship bin/install.js + bin/agents.js in the crw-mcp package (437dbc7)

0.18.0 (2026-06-23)

Features

extract: optional reasoning_effort config field (e677190)
install: auto-launch crw setup after install + conversion-tuned copy (5cc35f9)
mcp: add install command that wires skill + MCP for all agents (4b41621)

Bug Fixes

ci: rebase-retry docs-sync push + sync 0.17.0 changelog (597aa90)
install: default to the crw CLI + drop the musl/glibc gate (9958273)
install: drop the GB figure from local, surface 500 credits everywhere (f997aa0)
mcp: clarify crawl/parse jsonSchema arg for MCP clients (91be697)
mcp: static musl linux binaries + correct claude mcp add docs (75333d6)
server: make crw- model-prefix boot guard opt-in (5cb867c)
stealth: bump UA + Sec-Ch-Ua from Chrome 131 to 150 (695863c)

0.17.0 (2026-06-21)

Features

renderer: add opt-in Camoufox stealth renderer tier (744beda)
renderer: conditional hedge + event-driven readiness for p90 (9065007)
scrape: add screenshot output format via CDP capture (61e03e7), closes #161
sdk: add Research API methods to TS + Python SDKs (3a2f710)
search: add Firecrawl-compatible research API engine layer (ba1a87c)
search: overlap query-expansion scrape with original (C1) (4f6147e)
skills: add crw agent skill set (0ddbe01)
skills: publish crw-research agent skill + docs install command (4638715)

Bug Fixes

docs: render :::tabs and :::callouts in prerendered pages (f3a495a)
map: render SPA shells during URL discovery (0ec4bf9), closes #166
mcp,sdk: drop phantom search country param, export CrwApiError (58b8e5c)
pdf: bound sandbox child address space to prevent false pdf_too_large (06acb83)
proxy: normalize empty CRW_CRAWLER__PROXY to None (#154) (b3d0fe9)
scrape: capture screenshot outside the nav-budget race (4021b50)
search: resolve arXiv inspect via Semantic Scholar (70126c6)

Performance

search: research concurrency 4->8, cache cap 20k->3k (e3a6ac3)

0.16.0 (2026-06-14)

Features

mcp: optimize MCP server for context, weight, and conformance (aac7999)
proxy: add proxy list + rotation primitives and HTTP-path rotation (776e9fb)
proxy: rotate the JS/Chrome (CDP) path per request (422ac09)
proxy: v2 BYOP plumbing + honest docs + verification harness (0983ba3)

Bug Fixes

extract: stop doubling /v1 in structured-extraction chat URL (d8b8ebc)
extract: unify Anthropic structured URL, stop /v1/messages doubling (90ce3dd)
proxy: accept snake_case proxy_list alias on v1 ScrapeRequest/CrawlRequest (a8e7b71)
proxy: CLI crawl/map --proxy reaches the JS/CDP tier (round-4 review) (6ee7175)
proxy: resolve review findings (IP-leak/correctness hardening) (72e0486)
proxy: route /map discovery through the rotator (round-2 review) (5ee06cf)
proxy: route crawl robots/sitemap egress through the rotator (4835e7d)

0.15.2 (2026-06-12)

Bug Fixes

mcp: npm launcher downloads binary when platform pkg missing (4ae1aa6)

0.15.1 (2026-06-11)

Bug Fixes

ci: poll npm in verify_npm_sdk + expose ./package.json export (ed185dd)

0.15.0 (2026-06-10)

⚠ BREAKING CHANGES

CrwClient() with no API key now targets the cloud and raises if unauthenticated, instead of running locally. Set CRW_LOCAL=1 for the previous zero-config local behavior.

Features

add TypeScript SDK (crw-sdk) (1dd96f7)
cloud-first default + full client SDK parity (21e819e)
fold langchain/crewai adapters into crw.integrations extras (cd49120)
release: publish crw-cli (and crw-browse) to crates.io (b3d8004)
search: accept native SearXNG categories as passthrough (b3bcc5d)

Bug Fixes

antibot: detect Google rate-limit/bot-wall pages served with HTTP 200 (08bb46e)
ci: compile TS tests to JS so the SDK suite runs on Node 18/20 (e1b7546)
extract: avoid doubling chat/completions in structured base_url (3816bb6)
map: fold hyphen/underscore param spellings, add compare actions (f34ae12), closes #128
npm: declare node>=18 engines on crw-mcp packages (3c953fe)
renderer: reap CDP target + context on PoolGuard cancellation drop (0eebadc)

0.14.0 (2026-06-08)

Features

onboarding: cloud-default messaging + api.fastcrw.com base URL (4ccc92c)
pdf: PDF→markdown via pdf-inspector with Firecrawl-compatible parsers + /v2/parse (196b153)
search: query-relevance rerank, list answers, multi-round latency guard (ebafe83)

Bug Fixes

accept string env vars for auth.api_keys (534f932)
pdf: vendor test fixture into each crate (preflight: no cross-crate include) (2c7bfe9)
release: scope out-of-crate include check to src/ only (cd61fbf)
release: verify apt/homebrew by artifact, not flaky status-ack (7a572fd)

0.13.4 (2026-06-07)

Bug Fixes

docker: install aarch64 libc headers for cross-compile + CI guard (8cd31ac)

0.13.3 (2026-06-06)

Bug Fixes

release: stop verify-publish reporting false failures (d57c8c6)

Performance

docker: cross-compile arm64 instead of QEMU (2h -> ~3min) (a7cab42)

0.13.2 (2026-06-06)

Bug Fixes

server: embed openapi spec inside the crate so it can publish (e606cb9)

0.13.1 (2026-06-06)

Bug Fixes

release: correct publish tier ordering + guard topology (c84deed)

0.13.0 (2026-06-06)

Features

search: deterministic Wikidata entity-relation lookup (W3) (aa96e3e)

Bug Fixes

release: sync Cargo.lock internal crate versions to 0.12.1 (b5fc8a5)

0.12.1 (2026-06-05)

Bug Fixes

release: bump internal dep pins and track them in release config (0139ec2)
release: sync Cargo.lock with bumped internal dep versions (5c89d60)

0.12.0 (2026-06-05)

Features

answer: gated moat-hardening abstention (answer_guarded) (7ef7f32)
mcp: emit structuredContent for crw_search; bump protocol to 2025-06-18 (0cd9a4f), closes #89
search: diagnose search config and name unreachable host (#90) (25f9441)
search: pin SearXNG infoboxes/answers as structured sources (W0) (554f18c)

Bug Fixes

search: use resolvable searxng host in docker config (#90) (d966021)

0.11.0 (2026-06-03)

Features

api: serve /openapi.json and /openapi-3.0.json from crw-server (3dc79b4)
docs: ship OpenAPI spec, SKILL.md, and agent-shell (2145c97)
docs: wave 1 — API surface unblock for AI agent citations (9b28090)
docs: wave 2 — 15-page glossary cluster for AIO citations (14cd7e1)
extract: wave 2 cache token telemetry + DeepSeek provider tag fix (1b4f1aa)
mcp,cli,docs: snippet alias + agent-shell polish for benchmark wins (3472d13)
monitor: add feature-gated self-host crw-monitor mode (M6) (ff732f3)
monitor: add stateless change-tracking diff engine + LLM judge (a078081)
monitor: stateless change-tracking diff engine + LLM judge + self-host monitor (dc432ce)
renderer: add chrome_proxy tier + antibot-driven failover (0e37e30)
search: adaptive multi-round evidence-scout retrieval (gated) (9dd3224)
search: commit-policy answer prompt to cut over-abstention (#71) (cac7b29)
search: gated calibrated-answer path to cut over-abstention (#78) (a53225d)
search: gated page-2 fallback for thin reranked answer pools (#77) (2a7f0ac)
search: gated synthesis temperature/seed for deterministic eval (#81) (52a8c58)
search: multi-query expansion for the answer path (gated, default off) (#73) (4e387af)
search: multi-variant query expansion for recall (gated) (7bd093f)
search: passage-level relevance gate for the answer path (gated, off) (#75) (43ac625)
search: RRF re-rank + junk/coverage/geo filter + query cleaning for answer path (32efee6)
search: RRF re-rank + junk/coverage/geo filter + query cleaning for answer path (682fa1c)
search: snippet-fallback (Pattern A) + bake calibrated-answer durable (#79) (de1b1f1)
search: wave 4 R1+R2 — aggregated llmUsage + per-leg max_tokens (v0.11.0) (3570f82)
search: wave 4 R1+R2 + bump 0.11.0 (9ebd23d)
v2: add Firecrawl /v2 API surface (f793ec0)

Bug Fixes

antibot: strip inline data-URIs before classifier deep scan (6f1cbd2)
browse: address outbound hardening review feedback (a3c2076)
browse: harden outbound URL handling (c370972)
cli: swap SearXNG default to 127.0.0.1 + actionable error hint (618d41b)
docker: include OpenAPI specs in crw-api build context (f70c7af)
docker: include OpenAPI specs in crw-api build context (0238138)
docs: add 2 missing Firecrawl-shape shims caught by sapient (22b3d54)
extract: accept deepseek/openai-compatible providers in structured extract (2f0d86d)
search: default reranker to the proven lexical core (#69) (06585b6)
search: point engine at searxng-internal alias (search_rpc-only) (#70) (110aee7)
v2: batch status mutates state in place (no O(n^2) snapshot copy) (c03d359)
v2: emit invalidURLs (not invalidUrls) on batch start (611c54d)

Performance

search: calibrated answer top_n 8->5 (4x faster, accuracy holds) (#80) (751f8ee)

0.10.0 (2026-05-20)

Features

detector: add vendor-specific anti-bot block markers (c88c508)
renderer: add chrome_proxy as 4th fallback tier (b4da4f7)
renderer: per-request country via CDP proxy auth (11b4d32)

Bug Fixes

release: harden npm publish + fix mcp-registry verifier (9d4076f)
renderer: detect CloudFront/WAF 403 as bot-wall (7e058b2)
renderer: escalate JS tier on 4xx/5xx and vendor-detected blocks (648c372)

0.9.1 (2026-05-16)

Bug Fixes

release: sync crw-cli internal dep versions with workspace (26c528e)

0.9.0 (2026-05-16)

Features

cli: add AI extraction flags and crw setup --reset (912eea0)

0.8.3 (2026-05-15)

Features

cli: two-phase auto-fallback for crw <url> scrape (a871e54)
setup: make config.toml the canonical source for crw setup (b07c154)

Miscellaneous

release 0.8.2 (38ae764)
release 0.8.3 (efba1b3)

0.8.2 (2026-05-15)

Features

cli: two-phase auto-fallback for crw <url> scrape (a871e54)
setup: make config.toml the canonical source for crw setup (b07c154)

Miscellaneous

release 0.8.2 (38ae764)

0.8.2 (2026-05-14)

Bug Fixes

release: move crw-cli to unpublished and update dep versions (7f121f6)

0.8.1 (2026-05-14)

Bug Fixes

cli: mark crw-cli as publish=false to fix release (3104cc5)

0.8.0 (2026-05-14)

Features

cli: add interactive setup wizard (a5613b9)

0.7.1 (2026-05-12)

Bug Fixes

bump stale internal version pins to 0.7.0 (#48) (0bec22a)

0.7.0 (2026-05-12)

Features

LLM summary and search answer (BYOK) (#45) (ffcc2a5)

0.6.4 (2026-05-12)

Features

renderer: add bounded browser-context pool for Chrome tier (#43) (69b4861)

0.6.3 (2026-05-12)

Features

map: drop action URLs and strip tracking params (closes #40) (#41) (6d9ed39)

0.6.2 (2026-05-10)

Features

search: add /v1/search endpoint backed by bundled SearXNG sidecar (f4bd7f4)

Bug Fixes

antibot: drop bare 'captcha'/'access denied' markers — false positives (fae6c09)
crawl: drop redundant .into_iter() for clippy 1.95 (#39) (fb4032b)
map: WordPress sitemap-index timeout (closes #33) (c3dfd6c)
release: register crw-search crate in release manifest (9074761)
search: codex iteration-1 hardening — error mapping, resource bounds, container (5acba7b)
search: codex iteration-2 — error-body cap, per-source row budget, doc (a440d6e)
search: codex iteration-3 — predicate-based well-formed filter (4b4df3a)
search: use real SearXNG image tag and add fallback secret_key (be1f403)

0.6.1 (2026-05-09)

Features

metrics: cdp_pending_requests, cdp_live_connections, (b5f7bec)
renderer: live-connection registry + 60s telemetry sampler (b5f7bec)
renderer: target lifecycle metric + leaked detection (b5f7bec)
server: /ready endpoint with deep status code (b5f7bec)

Bug Fixes

release: bulletproof publish pipeline and drop pdf feature (8fcf2f6)
renderer: invalidate cached chrome WS URL on connect failure (b5f7bec)

0.6.0 (2026-05-09)

Features

extract: scale recall to 63.74% on 1000-URL benchmark (5b85555)
renderer: add browserless/chromium opt-in stealth profile (+2.5pt) (d2414c9)
renderer: chrome-stealth wiring + CDP discovery improvements (6b2e77c)
server,core,crawl: plumb tier timeouts and recall pipeline (7cbee43)

Miscellaneous

release 0.6.0 (bd03a35)

0.5.0 (2026-05-04)

Features

core: add deadline module and request/renderer config scaffolding (5a4e69a)
core: thread end-to-end Deadline through scrape pipeline (5991986)
crawl: key per-domain rate limiter by eTLD+1 (39c7954)
crawl: per-host concurrency cap on the eTLD+1 limiter (274f462)
renderer: add browserless/chromium opt-in stealth profile (236f626)
renderer: chrome nav-budget cap + truncated/deadline_exceeded flags (c57cef8)
renderer: chrome request-paused interception pump (T27) (13fcaa4)
renderer: leak-through fallback when global breaker open & host clean (86a9e36)
renderer: outcome-aware breaker + extraction and stealth fixes (86dd10f)
renderer: own per-eTLD+1 host limiter in FallbackRenderer (0577516)
renderer: recover FC-wins URLs to reach 92% bench coverage (ba12424)

Bug Fixes

compose: auto-restart and bound memory for renderer containers (dd610cc)
core: emit meaningful Timeout value when deadline already expired (607bb27)
crawl: prioritize anti-bot detection over placeholder warning (05aa933)
escalate to JS renderer on HTTP failure and empty markdown (9fc7934)
mcp: apply per-endpoint timeouts to proxy client (741f1b2)
renderer: enforce Deadline in HttpFetcher via tokio::time::timeout (b1c4058)
renderer: keep larger thin-result HTML when stitching attempts (8147236)
renderer: rescue 39 bench failures via UA, retry, and thin-content escalation (ddacb49)
server: classify anti-bot challenges as anti_bot, not no-markdown (3ece4dd)

Performance

renderer: drop fixed 2s JS wait, rely on SPA selector poll (cb043f7)
renderer: tighten tier timeouts and bump LP retry threshold (3f93d60)
renderer: widen breaker tolerance to 20 failures / 10s cooldown (6525a84)

Miscellaneous

release 0.5.0 (3987de1)

0.4.2 (2026-04-29)

Features

core: add render decision types and prometheus metrics scaffold (e08682b)
renderer: add per-host renderer preference cache (21e41d1)
renderer: track HTTP routing and warn on pinned-renderer failure (3208d27)
renderer: wire host preferences, circuit breakers, and CF detection (0c53c64)

Bug Fixes

core,renderer: surface render metadata and harden host normalization (ee4130b)
renderer: correct failure classification and routing decisions (4d684bd)
renderer: probe lifecycle, RAII guard, breaker counter (02044f5)

0.4.1 (2026-04-28)

Features

add per-request renderer field for scrape and crawl APIs (#29) (f1e0b63)
crw-browse: add interactive browser MCP server with phase-2 tools (e78879d)
honor renderer mode and force_js in config (fixes #28) (b76e473)

Bug Fixes

detect failed JS renders and fail over to next renderer (fca8fd5)
docs: use absolute logo paths in site.config.js (c5c9321)
docs: use absolute paths for logo and favicon assets (cdb1451)

0.4.0 (2026-04-22)

Features

add crw-browse MCP server, SOCKS5 proxy, extract mcp-proto (9a53753)

Miscellaneous

release 0.4.0 (e15fc74)

0.3.6 (2026-04-21)

Features

ci: add Google Indexing API notification for docs changes (3b5a340)
docs: generate static HTML pages for SEO indexability (7b321c0)

Bug Fixes

ci: trigger release workflow after release-please creates tag (27f2b67)
mcp: bump npm optionalDependencies from 0.3.0 to 0.3.5 (0e363e0)
renderer: detect loading placeholders and poll for content stability (d3b642b)

0.3.5 (2026-04-09)

Features

mcp: add crw_search tool for cloud/proxy mode (7fe4a8e)

0.3.4 (2026-04-09)

Bug Fixes

config: env var overrides not applied due to missing prefix_separator (71c7ae5), closes #18

0.3.3 (2026-04-09)

Features

add APT/Debian package distribution (c34b8e9)
renderer: spawn all available browsers for multi-renderer fallback (f546437)

0.3.2 (2026-04-08)

Bug Fixes

cli: auto-prepend https:// when no scheme provided (1050606)

0.3.1 (2026-04-08)

Features

add llms.txt, SKILL.md, MCP init command, and docs UI improvements (1b22d19)
add one-line install script with auto platform detection (6354f79)
docs: add dark mode logo support and improve docs UI (047df7b)
docs: align design with SaaS site and update branding (631d07c)
docs: unify docs into docs.fastcrw.com with Mintlify-style design (4994998)
docs: update URLs, dark mode, syntax highlighting, and benchmarks (0678cdf)
release all 3 binaries, CLI auto-browser, README overhaul (aa2950d)
update README banner with new logo (bcba1ad)

Bug Fixes

crawl HTTP polling bug + SDK test suite + docs (#16) (b6d8983)
remove internal implementation detail from roadmap (a5013f0)

0.3.0 (2026-04-02)

Features

add search() method to Python SDK and docs (591e3fe)

0.2.2 (2026-04-02)

Bug Fixes

renderer: escalate to JS renderer on HTTP 401/403 responses (f515caa)
use GitHub latest release instead of pinned version for binary download (4afcb1a)

0.2.1 (2026-03-28)

Bug Fixes

make crw-mcp npm wrapper executable (576a9eb)
use latest tag in server.json OCI identifier (7ec3b82)

0.2.0 (2026-03-28)

Features

add MCP Registry support for official server discovery (154b9f5)

0.1.2 (2026-03-27)

Bug Fixes

vendor pdf-inspector as crw-pdf for crates.io publishability (3f7681d)

0.1.1 (2026-03-26)

Bug Fixes

skip already-published crates without masking real errors (010649c)

0.1.0 (2026-03-26)

Features

add PDF extraction support via pdf-inspector (06dd5bf)

0.0.14 (2026-03-25)

Features

mcp: auto-download LightPanda binary for zero-config JS rendering (41f443b)
mcp: auto-spawn headless Chrome for JS rendering in embedded mode (9a6b0ae)

Bug Fixes

ci: move crw-mcp to Tier 4 in release workflow and add workflow_dispatch (d7584a8)

0.0.13 (2026-03-24)

Features

mcp: add embedded mode — self-contained MCP server, no crw-server needed (75e5450)

Bug Fixes

ci: switch release-please to simple type for Rust workspace support (51cd420)

v0.0.12

Readability drill-down — when <main> or <article> wraps >90% of body, the extractor now searches inside for narrower content elements (.main-page-content, .article-content, .entry-content, etc.) instead of discarding. Fixes MDN pages returning 35 chars and StackOverflow returning only the question
Base64 image stripping — data: URI images are removed in both HTML cleaning (lol_html) and markdown post-processing (regex safety net). Eliminates massive base64 blobs from Reddit and similar sites
Select/dropdown removal — <select> elements removed in onlyMainContent mode; dropdown/city-selector/location-selector noise patterns added. Fixes Hürriyet city dropdown leaking into content
Extended scored selectors — added .main-page-content, .js-post-body, .s-prose, #question, .page-content, #page-content, [role="article"] for better MDN, StackOverflow, and generic site coverage
Smarter fallback chain — when primary extraction produces too-short markdown, both fallbacks (cleaned HTML and basic clean) are tried and the longer result is picked, instead of short-circuiting on non-empty but insufficient content

v0.0.11

Stealth anti-bot bypass — automatic stealth JS injection via Page.addScriptToEvaluateOnNewDocument before every CDP navigation. Spoofs navigator.webdriver, Chrome runtime object, plugins array, languages, permissions API, iframe contentWindow, and toString() proxy to bypass Cloudflare, PerimeterX, and other bot detection systems
Cloudflare challenge auto-retry — detects Cloudflare JS challenge pages ("Just a moment", cf-browser-verification, challenge-platform) after page load and polls up to 3 times at 3-second intervals for non-interactive challenges to auto-resolve
HTTP → CDP auto-escalation — FallbackRenderer::fetch() in auto mode now checks HTTP responses for anti-bot challenge signatures and automatically escalates to JS rendering when detected, instead of returning the challenge HTML
Chrome failover in Docker — full automatic failover chain: HTTP → LightPanda → Chrome. Added chromedp/headless-shell as a Docker Compose sidecar service with 2GB shared memory. If LightPanda crashes on complex SPAs (React, Angular), Chrome handles the render
Chrome WS URL auto-discovery — CDP renderer resolves Chrome DevTools WebSocket URL via the /json/version HTTP endpoint with Host: localhost header (required for chromedp/headless-shell's socat proxy). Uses OnceCell for lazy one-time resolution
Proxy configuration docs — expanded proxy config comments with examples for HTTP, SOCKS5, and residential proxy providers (IPRoyal, Oxylabs, Smartproxy)
Raw string delimiter fix — fixed markdown.rs test that used r#"..."# with a string containing "#, changed to r##"..."##

v0.0.10 / v0.0.9

Crawl cancel endpoint — DELETE /v1/crawl/{id} cancels a running crawl job via AbortHandle and returns { success: true }
API rate limiting — token-bucket rate limiter (configurable rate_limit_rps, default 10). Returns 429 with error_code: "rate_limited" when exceeded
Machine-readable error codes — all error responses now include an error_code field (e.g. "invalid_url", "http_error", "rate_limited", "not_found")
Map response envelope — /v1/map now returns { success, data: { links } } instead of { success, links } for consistency with other endpoints
Fenced code blocks — indented code blocks (4-space) are post-processed into fenced (```) blocks for better LLM/RAG compatibility
Sphinx footer cleanup — "footer" added to exact-token noise patterns, catching <div class="footer"> in Sphinx/documentation sites
renderedWith: "http" — HTTP-only fetches now report rendered_with: "http" in metadata instead of null
405 JSON responses — all routes now have .fallback(method_not_allowed) returning structured JSON with error_code: "method_not_allowed" instead of empty bodies
Anchor link cleanup — empty anchor links ([](#id), [¶](#id)) and pilcrow/section signs stripped from Markdown output
role="contentinfo" cleanup — elements with ARIA roles contentinfo, navigation, banner, complementary removed during cleaning
Tiny chunk merging — topic chunking merges heading-only chunks (<50 chars) with the next chunk to improve RAG embedding quality

v0.0.8

Wikipedia / MediaWiki onlyMainContent fix — onlyMainContent: true now correctly extracts article text from Wikipedia pages (~49% size reduction). Previously the <html> element's class="vector-toc-available" matched the "toc" noise pattern via substring, removing the entire page
3-tier noise pattern matching — noise class/id matching now uses substring (long patterns), exact-token (short/ambiguous: toc, share, social, comment, related), and prefix (ad-, ads-) matching to avoid false positives
Structural element guard — noise handler never removes <html>, <head>, <body>, or <main> elements
Re-clean after readability — readability output is re-cleaned to strip residual noise (infobox, navbox, catlinks) that survives inside broad containers
Wikipedia-aware readability — added .mw-parser-output, #mw-content-text, #bodyContent to scored selectors; priority/scored selectors that wrap >90% of body are skipped
BYOK LLM extraction — per-request llmApiKey, llmProvider, llmModel fields for bring-your-own-key structured extraction without server config
JSON format validation — formats: ["json"] without jsonSchema now returns a 400 error instead of a warning
Block detection skip — pages >50 KB skip interstitial/block detection (no more false "blocked by anti-bot" on Wikipedia)
Null byte URL rejection — URLs with %00 or null bytes rejected at validation
Request timeout — default timeout bumped from 60s to 120s
Dockerfile fix — corrected cargo build flags, added config.docker.toml

v0.0.7

success: false on 4xx targets — scraping a 403/404/429 target with minimal body now correctly returns success: false with error details, instead of success: true with a warning. Targets with real content (custom error pages) still return success: true with a warning
JS renderer fallback warning — when renderJs: true is requested but no CDP renderer is available, the response now includes rendered_with: "http_only_fallback" and a warning instead of silently falling back
CDP health check — is_available() now runs a real Browser.getVersion command instead of just testing the WebSocket connection
Specific error messages — unknown formats now return descriptive errors (e.g., "Unknown format 'extract'. Valid formats: ...") instead of generic 422
"extract" format alias — formats: ["extract"] and formats: ["llm-extract"] are now accepted as aliases for "json" (Firecrawl compatibility)
Chunk dedup by default — deduplication is now enabled by default for all chunking strategies; separator-only chunks (---, ***) are filtered out
Chunk relevance scores — chunks now return { content, score, index } objects instead of plain strings when a query is provided
Map timeout — /v1/map accepts a timeout parameter (default 120s, max 300s) to prevent 502s on large sites
Stealth + JS rendering fix — stealth: true with renderJs: true no longer bypasses CDP; the shared renderer is used with stealth headers injected
BM25 NaN guard — prevents NaN scores when all chunks are empty

v0.0.6

Crate READMEs on crates.io — all 7 crates now have detailed README documentation visible on their crates.io pages, with usage examples, API docs, and installation instructions

v0.0.5

crw-cli now on crates.io — install the standalone CLI with cargo install crw-cli and scrape URLs without running a server
Parallelized release workflow — crate publishing uses tiered parallelism, cutting release time by ~2.25 minutes
CLI and MCP install docs — README now includes cargo install instructions for both crw-cli and crw-mcp

v0.0.4

Hardened rendering and warning semantics — improved reliability of the rendering pipeline and warning detection logic
XPath output escaping — XPath extraction results are now properly escaped to prevent injection
Broadened status warnings — expanded HTTP status code range that triggers warning metadata
Capped interstitial scan — bounded interstitial page detection to avoid excessive scanning
Clippy cleanup — simplified status code checks for cleaner, idiomatic Rust

v0.0.3

Warning-aware target handling — 4xx and anti-bot targets now return success: true with warning and metadata.statusCode
More reliable JS rendering — CDP navigation now waits for real page lifecycle completion before applying waitFor
Stealth decompression fix — gzip and brotli responses decode cleanly instead of leaking garbled binary payloads
Crawl compatibility — limit, maxPages, and max_pages now normalize to the same crawl cap
XPath and chunking fixes — XPath returns all matches, chunk overlap/dedupe is supported, and scorer rank order is preserved

v0.0.2

CSS selector & XPath — target specific DOM elements before Markdown conversion (cssSelector, xpath)
Chunking strategies — split content into topic, sentence, or regex-delimited chunks for RAG pipelines (chunkStrategy)
BM25 & cosine filtering — rank chunks by relevance to a query and return top-K results (filterMode, topK)
Better Markdown — switched to htmd (Turndown.js port): tables, code block languages, nested lists all render correctly
Stealth mode — rotate User-Agent from a built-in Chrome/Firefox/Safari pool and inject 12 browser-like headers (stealth: true)
Per-request proxy — override the global proxy on a per-request basis (proxy: "http://...")
Rate limit jitter — randomized delay between requests to avoid uniform traffic fingerprinting
crw-server setup — one-command JS rendering setup: downloads LightPanda, creates config.local.toml

v0.0.1

Firecrawl-compatible REST API — /v1/scrape, /v1/crawl, /v1/map with identical request/response format
6 output formats — markdown, HTML, cleaned HTML, raw HTML, plain text, links, structured JSON
LLM structured extraction — JSON schema in, validated structured data out (Anthropic tool_use + OpenAI function calling)
JS rendering — auto-detect SPAs via heuristics, render via LightPanda, Playwright, or Chrome (CDP)
BFS crawler — async crawl with rate limiting, robots.txt, sitemap support, concurrent jobs
MCP server — built-in stdio + HTTP transport for Claude Code and Claude Desktop
SSRF protection — private IPs, cloud metadata, IPv6, dangerous URI filtering
Docker ready — multi-stage build with LightPanda sidecar