Core Endpoint

Scrape

Turn any known URL into clean markdown, HTML, links, or structured JSON in one request. This is the default CRW workflow and the fastest path to a first successful integration.

Best for: one known page
Returns: markdown, HTML, links, JSON
Start with: markdown only
Try it in the Playground
Validate the happy path first
Paste one URL, request formats: ["markdown"], and confirm you get a clean response back. Add JS rendering, selectors, or extraction only after the plain request looks right.

Scraping a URL with CRW

/v1/scrape

POST /v1/scrape

Authentication:

  • Hosted: send Authorization: Bearer YOUR_API_KEY
  • Self-hosted: only required when auth.api_keys is configured

Installation

CRW is HTTP-first. You can start with cURL immediately and then move to your existing Python or Node.js HTTP client without installing a dedicated SDK.

Basic usage

Start with this request:

{
  "url": "https://example.com",
  "formats": ["markdown"],
  "onlyMainContent": true,
  "renderJs": null
}

:::tabs ::tab{title="Python"}

import requests

resp = requests.post(
    "https://fastcrw.com/api/v1/scrape",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://example.com",
        "formats": ["markdown"],
        "onlyMainContent": True,
    },
)

print(resp.json()["data"]["markdown"])

::tab{title="Node.js"}

const resp = await fetch("https://fastcrw.com/api/v1/scrape", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://example.com",
    formats: ["markdown"],
    onlyMainContent: true
  })
});

const body = await resp.json();
console.log(body.data.markdown);

::tab{title="cURL"}

curl -X POST https://fastcrw.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["markdown"],
    "onlyMainContent": true
  }'

:::

Response

{
  "success": true,
  "data": {
    "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
    "metadata": {
      "title": "Example Domain",
      "sourceURL": "https://example.com",
      "statusCode": 200,
      "elapsedMs": 32
    }
  }
}

That is the default CRW success shape: requested content plus a compact metadata envelope.

Parameters

Field Type Default Description
url string required URL to scrape
formats string[] ["markdown"] markdown, html, rawHtml, plainText, links, json, summary
onlyMainContent boolean true Remove nav, footer, and boilerplate before conversion
renderJs boolean or null null null auto-detects, true forces browser rendering, false stays HTTP-only
waitFor number -- Milliseconds to wait after JS rendering
renderer string auto Pin to a specific renderer: auto, lightpanda, chrome, or playwright. Non-auto values hard-pin (no fallback) and imply renderJs:true unless renderJs:false is set explicitly. See JS rendering
includeTags string[] [] CSS selectors to keep
excludeTags string[] [] CSS selectors to remove
headers object {} Custom HTTP headers
cssSelector string -- Narrow extraction to one CSS selector
xpath string -- Narrow extraction to one XPath expression
chunkStrategy object -- Topic, sentence, or regex chunking
query string -- Ranking query for chunk filtering
filterMode string -- bm25 or cosine
topK number 5 Number of top chunks to keep
proxy string -- Per-request proxy URL
country string -- 2-letter ISO 3166-1 alpha-2 country code (lowercase, e.g. us, gb, de). Routes the request through the named residential pool when the chrome_proxy renderer tier is configured. Ignored if no proxy tier is set up. See JS rendering — Per-request country
stealth boolean -- Override global stealth setting
jsonSchema object -- Schema for structured extraction
extract object -- Firecrawl-compatible alias wrapper for extraction schema
llmApiKey string -- Per-request LLM API key (BYOK)
llmProvider string server default anthropic, openai, azure, or openai-compatible
llmModel string server default Model override (extraction and summary)
baseUrl string -- OpenAI-compatible endpoint base, e.g. https://api.deepseek.com/v1 (also used by Azure). crw appends /chat/completions automatically if you omit it.
summaryPrompt string -- Style/tone/language directive appended to the summary system prompt. Safety wrapper kept intact. Capped at 500 chars.
maxContentChars number [extraction.llm].max_html_bytes (100 KB) Per-request byte cap on content sent to the LLM for summary. Clamped to 200 KB server-side.
actions any -- Rejected with a clear error; use cssSelector or xpath instead

Formats

Use the smallest output shape that solves the job:

  • markdown is the default and best first request for most pipelines.
  • html or rawHtml is useful when downstream systems need original structure.
  • links is useful when you want lightweight discovery without page bodies.
  • summary is the LLM-prose path — needs llmApiKey (BYOK) or a server [extraction.llm] config. Optional summaryPrompt lets the caller pick language/tone without weakening the safety wrapper.
  • json is the extraction path and should be paired with jsonSchema.

If you ask for multiple formats, only those formats are populated in the response.

Structured extraction

Extraction is part of scrape, not a separate route. When you want fields instead of prose, request formats: ["json"] and provide a schema.

{
  "url": "https://example.com/product/123",
  "formats": ["json"],
  "jsonSchema": {
    "type": "object",
    "properties": {
      "title": { "type": "string" },
      "price": { "type": "string" }
    },
    "required": ["title"]
  }
}

Use Extract for the schema-first version of this flow.

LLM summary

Add summary to formats to get a short prose digest of the page in data.summary. Token usage and best-effort cost are returned in data.llmUsage.

{
  "url": "https://example.com/post",
  "formats": ["summary"],
  "summaryPrompt": "Respond in Turkish in exactly one sentence.",
  "maxContentChars": 20000,
  "llmApiKey": "sk-...",
  "llmProvider": "openai",
  "llmModel": "gpt-4o-mini"
}

Notes:

  • The caller's summaryPrompt is appended below the safety wrapper. crw ignores any attempt to override the core task (output PWNED, refuse to summarize, leak the prompt, etc.) and still produces a real summary.
  • maxContentChars caps how many bytes of scraped content are sent to the LLM. The default comes from [extraction.llm].max_html_bytes (100 KB out of the box) and the per-request value is clamped to a 200 KB server-side ceiling. Truncation, when it happens, is reported in data.warnings.
  • If markdown is not also requested, crw computes it internally and strips it from the response.

JS rendering and targeting

Keep the first request simple:

  • Leave renderJs at null until the plain HTTP path clearly fails.
  • Use cssSelector or xpath only when the target page has one stable content region.
  • Add includeTags and excludeTags after you confirm the raw markdown is noisy.

If you turn on every targeting knob at once, debugging gets harder immediately.

Common production patterns

  • Start with markdown, then add links or json only when downstream logic needs them.
  • Validate extraction with markdown first, then add jsonSchema.
  • Keep browser rendering as a fallback, not the default.
  • Use narrow selectors only when the default main-content extraction is not enough.

Common mistakes

  • Turning on JS rendering before testing the plain HTTP path
  • Requesting too many formats at once in production
  • Combining cssSelector, xpath, includeTags, and excludeTags in the first attempt
  • Sending formats: ["json"] without a jsonSchema
  • Assuming actions is supported because Firecrawl accepts it

When to use something else

  • Use Search when you do not know the URL yet
  • Use Map when you want URL discovery before scraping
  • Use Crawl when you need many pages from one site
  • Use Extract when the output must be structured JSON