Core Endpoint

Search

Search the web first, then optionally scrape the results you care about. Works out of the box on self-hosted CRW via the bundled SearXNG sidecar — no third-party API key needed. Free, self-hostable alternative to Tavily / Serper / Brave Search.

Best for: unknown URLs

Self-hosted: bundled SearXNG sidecar

Hosted: fastcrw.com (managed)

Start with: search only, then add scraping

Try it in the Playground View Scrape

Try it in the Playground

Start with result discovery only

Use a small query and limit: 5 first. Add scrapeOptions only when you already know you need page content from those search results.

Open Playground Get API Key

**Self-hosted users**: `docker compose up` boots a SearXNG sidecar automatically (reachable inside the Compose network as `searxng:8080`). `/v1/search` is live on `http://localhost:3000` with no extra setup. To point at an existing SearXNG instance instead, set `CRW_SEARCH__SEARXNG_URL=http://your-host:8080` and remove the `searxng` service from your compose file. To disable search entirely, set `[search].enabled = false` — the route returns a clear `search_disabled` error (HTTP 503). See the [Docker → Search (SearXNG)](/docker) section for the full setup, the `SEARXNG_BASE_URL` vs `searxng_url` distinction, and cold-start timing.

Searching the web with CRW

/v1/search

POST http://localhost:3000/v1/search          # self-hosted
POST https://api.fastcrw.com/v1/search        # hosted

Authentication:

Self-hosted: no auth by default (add a reverse proxy / API key middleware if you expose it publicly)
Hosted: send Authorization: Bearer YOUR_API_KEY

Installation

Like the rest of the CRW API, search is HTTP-first. Use cURL or your existing HTTP client.

Basic usage

Start with this request:

{
  "query": "web scraping tools",
  "limit": 5
}

import requests

# Self-hosted
resp = requests.post(
    "http://localhost:3000/v1/search",
    json={"query": "web scraping tools", "limit": 5},
)

# Or hosted (with API key)
# resp = requests.post(
#     "https://api.fastcrw.com/v1/search",
#     headers={"Authorization": "Bearer YOUR_API_KEY"},
#     json={"query": "web scraping tools", "limit": 5},
# )

# Each result row has: title, url, snippet, description, position, score.
# `snippet` is the LLM-ready summary line (Firecrawl-compatible name);
# `description` is the same value, kept as an alias.
for item in resp.json()["data"]:
    print(item["title"], item["url"], item["snippet"])

const resp = await fetch("http://localhost:3000/v1/search", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ query: "web scraping tools", limit: 5 })
});

const body = await resp.json();
console.log(body.data);

# Self-hosted (no auth)
curl -X POST http://localhost:3000/v1/search \
  -H "Content-Type: application/json" \
  -d '{"query": "web scraping tools", "limit": 5}'

# Hosted
curl -X POST https://api.fastcrw.com/v1/search \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "web scraping tools", "limit": 5}'

Response

{
  "success": true,
  "data": [
    {
      "url": "https://example.com/article",
      "title": "Article Title",
      "snippet": "A summary line from the search result...",
      "description": "A summary line from the search result...",
      "position": 1,
      "score": 9.5
    }
  ]
}

snippet is the LLM-ready summary line — name kept identical to Firecrawl so existing pipelines work unchanged. description is the same value under the SearXNG-native name. Pick whichever; both are emitted.

That is the flat response shape used when sources is not set.

Parameters

Field	Type	Default	Description
`query`	string	required	Search query (1–2000 chars)
`limit`	number	`5`	Maximum results per source (max `20`)
`lang`	string	--	Result language hint such as `"en"` or `"tr"`
`tbs`	string	--	Recency filter: `qdr:h`, `qdr:d`, `qdr:w`, `qdr:m`, `qdr:y`
`sources`	string[]	--	Result groups such as `"web"`, `"news"`, `"images"`
`categories`	string[]	--	Curated filters (`"github"`, `"research"`, `"pdf"`) plus any native SearXNG category (`"science"`, `"it"`, `"news"`, `"files"`, …) passed straight through. Max 5 entries. See Curated vs. passthrough categories
`scrapeOptions`	object	--	Scrape each result URL after search
`summarizeResults`	boolean	`false`	When `true`, each scraped result is summarized by the LLM and the digest appears in `result.summary`. Needs LLM config (per-request key or server). Fan-out is bounded by `[extraction.llm].max_concurrency`.
`answer`	boolean	`false`	When `true`, after scraping the top results crw synthesizes a single answer over them. The answer + `citations` land on the response wrapper.
`answerTopN`	number	`5` (max `10`)	Number of top-scoring results to feed into the answer pipeline
`maxCharsPerSource`	number	`8192`	Per-source byte cap on markdown fed into the answer prompt. Clamped to 32 KB server-side.
`maxContentChars`	number	`[extraction.llm].max_html_bytes` (100 KB)	Per-result byte cap on markdown sent to the per-result summarizer (`summarizeResults`). Clamped to 200 KB server-side. Independent from `maxCharsPerSource`.
`summaryPrompt`	string	--	Style/tone/language directive appended to the per-result summary prompt. Capped at 500 chars.
`answerPrompt`	string	--	Style/tone/language directive appended to the answer-synthesis prompt. Capped at 500 chars. Cannot override the "answer using ONLY provided sources" rule or the citation discipline.
`answerTemperature`	number	provider default	Sampling temperature for the answer-synthesis LLM call. Set `0` for deterministic/benchmark runs.
`queryExpandVariants`	number	server config	Number of diverse query rewrites fetched and unioned when query expansion is enabled. Overrides `[search].query_expand_variants` for this request.
`multiRound`	boolean	server config	When `true`, fires an adaptive evidence-scout round if the first-round answer abstains. Overrides `[search].multi_round` for this request.
`answerListFormat`	boolean	server config	When `true` (and the query has list intent such as "best/top X"), renders the answer as a ranked list instead of prose. `false` forces prose. Overrides `[search].answer_list_format`.
`llmApiKey`	string	--	Per-request LLM API key
`llmProvider`	string	server default	`anthropic`, `openai`, `deepseek`, `azure`, or `openai-compatible`
`llmModel`	string	server default	Model override
`baseUrl`	string	--	OpenAI-compatible endpoint base (e.g. DeepSeek, Azure)

scrapeOptions:

Field	Type	Default	Description
`formats`	string[]	`["markdown"]`	Allowed: `markdown`, `html`, `rawHtml`, `links`. `plainText` and `json` (extract) are not supported on `/v1/search` — use `/v1/scrape` for those
`onlyMainContent`	boolean	`true`	Keep content focused on the main body

Search result types

Without sources, CRW returns a flat list:

{
  "success": true,
  "data": [
    {
      "url": "https://example.com/article",
      "title": "Article Title",
      "snippet": "Search summary line...",
      "description": "Search summary line...",
      "position": 1,
      "score": 9.5
    }
  ]
}

With sources, CRW returns grouped results:

{
  "success": true,
  "data": {
    "web": [{ "url": "...", "title": "...", "snippet": "...", "description": "..." }],
    "news": [{ "url": "...", "title": "...", "snippet": "...", "description": "...", "publishedDate": "2026-04-02T14:00:00" }],
    "images": [{ "url": "...", "imageUrl": "...", "thumbnailUrl": "..." }]
  }
}

Search with content scraping

When you need more than result snippets, add scrapeOptions:

{
  "query": "web scraping tools",
  "limit": 3,
  "scrapeOptions": {
    "formats": ["markdown"],
    "onlyMainContent": true
  }
}

That enriches eligible results with scraped page content. It is powerful, but it is also the moment search becomes more expensive, so keep it off until you need it.

LLM-assisted search

CRW can turn a search-with-scrape into either per-result summaries or a single synthesized answer (or both).

Per-result summaries (`summarizeResults`)

{
  "query": "what is tokio rust",
  "limit": 3,
  "scrapeOptions": { "formats": ["markdown"] },
  "summarizeResults": true,
  "summaryPrompt": "Respond in Turkish in one sentence per result.",
  "maxContentChars": 20000,
  "llmApiKey": "sk-...",
  "llmProvider": "openai",
  "llmModel": "gpt-4o-mini"
}

Each scraped result that produced markdown gets a result.summary field. Per-result failures attach a warning on the response but do not fail the whole request. Fan-out is bounded by [extraction.llm].max_concurrency (default 4).

Synthesized answer (`answer`)

{
  "query": "what is tokio rust",
  "limit": 3,
  "answer": true,
  "answerTopN": 3,
  "answerPrompt": "Respond in Turkish in exactly two sentences.",
  "scrapeOptions": { "formats": ["markdown"] },
  "llmApiKey": "sk-...",
  "llmProvider": "openai",
  "llmModel": "gpt-4o-mini"
}

The response wrapper carries:

{
  "success": true,
  "data": {
    "results": [ /* normal flat or grouped search results */ ],
    "answer": "Tokio is a Rust runtime…",
    "citations": [
      { "url": "https://...", "title": "...", "position": 0 }
    ],
    "llmUsage": { "inputTokens": 3420, "outputTokens": 96, "totalTokens": 3516, "estimatedCostUsd": 0.0008, "model": "gpt-4o-mini", "provider": "openai" },
    "warnings": []
  }
}

Citation discipline:

source_id returned by the model must map to a source actually in the input list. Fabricated ids are dropped.
position is clamped to [0, sources.len()).
The list is deduped on (source_id, position) and capped at 20 entries.

If the answer call fails (rate limit, network error, etc.), answer is null, any successful per-result summaries are still returned, and warnings explains what went wrong. CRW does not throw away partial work.

Caller-supplied directives

summaryPrompt and answerPrompt let you steer language/tone/format without weakening the safety wrapper:

They are appended below the hardcoded system prompt, not in place of it.
The wrapper explicitly tells the model to ignore directive contents that try to replace the task (fixed-string outputs, refusals, citation-skip, prompt leaks).
Each directive is truncated to 500 chars server-side.

Where the key comes from

Same per-request key pattern as /v1/scrape: send llmApiKey / llmProvider / llmModel / baseUrl in the request body, or configure [extraction.llm] in config.toml.

Freshness, sources, and categories

Use tbs when freshness matters more than broad recall.
Use sources when you want different result groups such as web, news, or images.
Use categories to narrow the query domain without rewriting the query itself.

Good default: add one narrowing control at a time so you can see which one actually improved the results.

Curated vs. passthrough categories

categories accepts two kinds of values, and you can mix them freely (up to 5 entries):

Value	Kind	What CRW does
`github`	curated	Switches to the engines in `[search].github_engines` (default: `github`).
`research`	curated	Switches to the engines in `[search].research_engines` (default: `arxiv`, `crossref`, `google scholar`, `semantic scholar`).
`pdf`	curated	Appends `filetype:pdf` to the query (not an engine switch).
anything else	passthrough	Forwarded verbatim to SearXNG's native `categories` parameter — `science`, `it`, `news`, `files`, `images`, `map`, `music`, `social media`, …

The curated names (github/research/pdf) are Firecrawl-compatible and behave exactly as before. Passthrough values are the additive part: CRW does not maintain its own engine list for them — it hands the category string to SearXNG, which already knows the engine→category routing from its own settings.yml. That means new categories work without any CRW code or config change, and your self-hosted SearXNG governs exactly which engines each category hits.

{
  "query": "crispr base editing",
  "categories": ["science"],
  "limit": 5
}

{
  "query": "rust async runtime",
  "categories": ["research", "it"]
}

In the second example, research still drives CRW's curated academic engines while it is forwarded to SearXNG as a native category — both apply to the same query.

Which passthrough categories actually return results depends on the engines your SearXNG instance enables. The bundled sidecar uses `use_default_settings: true` (see `config/searxng/settings.yml`), so all of upstream SearXNG's default categories are available. An unknown or disabled category is silently ignored by SearXNG (it falls back to `general`) rather than erroring. The list of categories a given instance exposes is documented under [SearXNG → Configured Engines](https://docs.searxng.org/user/configured_engines.html).

SearXNG query parameters CRW sends

For reference (and when debugging a self-hosted SearXNG directly), this is how the public request fields map onto the SearXNG /search query parameters CRW emits:

SearXNG param	Sourced from	Notes
`q`	`query`	Cleaned (leading filler stripped); `pdf` category appends `filetype:pdf`.
`categories`	`sources` + passthrough `categories`	Comma-joined union, de-duplicated. `web`→`general`, `news`→`news`, `images`→`images`, plus any passthrough value.
`engines`	curated `categories`	Comma-joined engines for `github`/`research` (from config). Omitted when no curated category is set.
`language`	`lang`	Defaults to `en` when omitted/empty so results aren't locale-mixed.
`time_range`	`tbs`	`qdr:h`/`qdr:d`→`day`, `qdr:w`→`week`, `qdr:m`→`month`, `qdr:y`→`year`.
`format`	—	Always `json` (the sidecar enables the JSON formatter; HTML stays on for debugging).

pageno and safesearch are available on the low-level crw search CLI but are not exposed on /v1/search. Full upstream reference: SearXNG Search API.

Self-hosting the SearXNG sidecar

The default docker-compose.yml ships a hardened SearXNG container:

Read-only root filesystem with sized tmpfs scratch
All Linux capabilities dropped, no-new-privileges
mem_limit, pids_limit set
Pinned upstream image tag (we never run :latest)
Config mounted read-only from config/searxng/settings.yml

It is mere-aggregation under AGPL — you are running an unmodified upstream SearXNG image with config mounted at runtime, so no §13 corresponding-source obligations attach to the image itself. If you redistribute your CRW deployment publicly, AGPL §13 still requires you to offer the corresponding source of CRW (which is already on GitHub) to your users.

Common production patterns

Start with search only, then add scrapeOptions after you verify result quality.
Use sources: ["news"] or tbs when freshness matters more than broad recall.
Use categories: ["github"] or ["research"] to narrow noisy queries.
Keep limit low on the first pass so the result quality is easy to inspect.

Common mistakes

Adding scrapeOptions to every search before you know you need page content
Confusing sources with categories
Treating qdr:h as truly hourly precision; SearXNG collapses it to day
Sending plainText or json in scrapeOptions.formats — use /v1/scrape for those

When to use something else

Use Scrape when you already know the exact URL
Use Map for site-specific discovery
Use Crawl when you need a bounded multi-page job on one site

CRW — open-source web scraper & crawler (Firecrawl-compatible)

Quick: copy this and run

Python (stdlib only — defaults to hosted, fails loudly if key is missing)

cURL

CLI (one-shot LLM-ready output)

Response shape

Other endpoints

Install (local setup only)

Resources

Search

Searching the web with CRW

/v1/search

Installation

Basic usage

Response

Parameters

Search result types

Search with content scraping

LLM-assisted search

Per-result summaries (`summarizeResults`)

Synthesized answer (`answer`)

Caller-supplied directives

Where the key comes from

Freshness, sources, and categories

Curated vs. passthrough categories

SearXNG query parameters CRW sends

Self-hosting the SearXNG sidecar

Common production patterns

Common mistakes

When to use something else

Search

Searching the web with CRW

/v1/search

Installation

Basic usage

Response

Parameters

Search result types

Search with content scraping

LLM-assisted search

Per-result summaries (summarizeResults)

Synthesized answer (answer)

Caller-supplied directives

Where the key comes from

Freshness, sources, and categories

Curated vs. passthrough categories

SearXNG query parameters CRW sends

Self-hosting the SearXNG sidecar

Common production patterns

Common mistakes

When to use something else

Per-result summaries (`summarizeResults`)

Synthesized answer (`answer`)