Scrape
Turn any known URL into clean markdown, HTML, links, or structured JSON in one request. This is the default CRW workflow and the fastest path to a first successful integration.
formats: ["markdown"], and confirm you get a clean response back. Add JS rendering, selectors, or extraction only after the plain request looks right.Scraping a URL with CRW
/v1/scrape
POST /v1/scrape
Authentication:
- Hosted: send
Authorization: Bearer YOUR_API_KEY - Self-hosted: only required when
auth.api_keysis configured
Installation
CRW is HTTP-first. You can start with cURL immediately and then move to your existing Python or Node.js HTTP client without installing a dedicated SDK.
Basic usage
Start with this request:
{
"url": "https://example.com",
"formats": ["markdown"],
"onlyMainContent": true,
"renderJs": null
}
:::tabs ::tab{title="Python"}
import requests
resp = requests.post(
"https://fastcrw.com/api/v1/scrape",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"url": "https://example.com",
"formats": ["markdown"],
"onlyMainContent": True,
},
)
print(resp.json()["data"]["markdown"])
::tab{title="Node.js"}
const resp = await fetch("https://fastcrw.com/api/v1/scrape", {
method: "POST",
headers: {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
},
body: JSON.stringify({
url: "https://example.com",
formats: ["markdown"],
onlyMainContent: true
})
});
const body = await resp.json();
console.log(body.data.markdown);
::tab{title="cURL"}
curl -X POST https://fastcrw.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"formats": ["markdown"],
"onlyMainContent": true
}'
:::
Response
{
"success": true,
"data": {
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"sourceURL": "https://example.com",
"statusCode": 200,
"elapsedMs": 32
}
}
}
That is the default CRW success shape: requested content plus a compact metadata envelope.
Parameters
| Field | Type | Default | Description |
|---|---|---|---|
url |
string | required | URL to scrape |
formats |
string[] | ["markdown"] |
markdown, html, rawHtml, plainText, links, json |
onlyMainContent |
boolean | true |
Remove nav, footer, and boilerplate before conversion |
renderJs |
boolean or null | null |
null auto-detects, true forces browser rendering, false stays HTTP-only |
waitFor |
number | -- | Milliseconds to wait after JS rendering |
includeTags |
string[] | [] |
CSS selectors to keep |
excludeTags |
string[] | [] |
CSS selectors to remove |
headers |
object | {} |
Custom HTTP headers |
cssSelector |
string | -- | Narrow extraction to one CSS selector |
xpath |
string | -- | Narrow extraction to one XPath expression |
chunkStrategy |
object | -- | Topic, sentence, or regex chunking |
query |
string | -- | Ranking query for chunk filtering |
filterMode |
string | -- | bm25 or cosine |
topK |
number | 5 |
Number of top chunks to keep |
proxy |
string | -- | Per-request proxy URL |
stealth |
boolean | -- | Override global stealth setting |
jsonSchema |
object | -- | Schema for structured extraction |
extract |
object | -- | Firecrawl-compatible alias wrapper for extraction schema |
llmApiKey |
string | -- | Per-request LLM API key |
llmProvider |
string | server default | anthropic or openai |
llmModel |
string | server default | Model override for extraction |
actions |
any | -- | Rejected with a clear error; use cssSelector or xpath instead |
Formats
Use the smallest output shape that solves the job:
markdownis the default and best first request for most pipelines.htmlorrawHtmlis useful when downstream systems need original structure.linksis useful when you want lightweight discovery without page bodies.jsonis the extraction path and should be paired withjsonSchema.
If you ask for multiple formats, only those formats are populated in the response.
Structured extraction
Extraction is part of scrape, not a separate route. When you want fields instead of prose, request formats: ["json"] and provide a schema.
{
"url": "https://example.com/product/123",
"formats": ["json"],
"jsonSchema": {
"type": "object",
"properties": {
"title": { "type": "string" },
"price": { "type": "string" }
},
"required": ["title"]
}
}
Use Extract for the schema-first version of this flow.
JS rendering and targeting
Keep the first request simple:
- Leave
renderJsatnulluntil the plain HTTP path clearly fails. - Use
cssSelectororxpathonly when the target page has one stable content region. - Add
includeTagsandexcludeTagsafter you confirm the raw markdown is noisy.
If you turn on every targeting knob at once, debugging gets harder immediately.
Common production patterns
- Start with
markdown, then addlinksorjsononly when downstream logic needs them. - Validate extraction with markdown first, then add
jsonSchema. - Keep browser rendering as a fallback, not the default.
- Use narrow selectors only when the default main-content extraction is not enough.
Common mistakes
- Turning on JS rendering before testing the plain HTTP path
- Requesting too many formats at once in production
- Combining
cssSelector,xpath,includeTags, andexcludeTagsin the first attempt - Sending
formats: ["json"]without ajsonSchema - Assuming
actionsis supported because Firecrawl accepts it