v2 API Reference
Complete reference for every /v2/* route. v2 is the default surface for the TypeScript SDK and exposes object-format fields, paginated crawl status, batch scraping, file parsing, and a compatibility alias for extract.
https://api.fastcrw.comhttp://localhost:3000Routes at a glance
| Method | Route | Purpose |
|---|---|---|
POST |
/v2/scrape |
Scrape one URL synchronously |
GET |
/v2/scrape/{job_id} |
Stub — always 404 (scrape is synchronous) |
POST |
/v2/crawl |
Start an async recursive crawl |
GET |
/v2/crawl/active |
List in-progress crawl job IDs |
GET |
/v2/crawl/{id} |
Poll crawl status and paginated results |
DELETE |
/v2/crawl/{id} |
Cancel a running crawl |
GET |
/v2/crawl/{id}/errors |
Fetch per-URL errors for a crawl job |
POST |
/v2/map |
Discover URLs, returns link objects |
POST |
/v2/search |
Web search with optional per-result scrape |
POST |
/v2/parse |
Upload a PDF, get markdown or structured JSON |
POST |
/v2/batch/scrape |
Start an async batch scrape over a URL list |
GET |
/v2/batch/scrape/{id} |
Poll batch status and paginated results |
DELETE |
/v2/batch/scrape/{id} |
Cancel a running batch |
GET |
/v2/batch/scrape/{id}/errors |
Fetch per-URL errors for a batch job |
POST |
/v2/extract |
DEPRECATED — async multi-URL LLM extraction |
GET |
/v2/extract/{id} |
DEPRECATED — poll extract job status |
GET |
/v2/capabilities |
Alias of /v1/capabilities |
v2 vs v1: key differences
| Feature | v1 | v2 |
|---|---|---|
formats field |
string[] only |
string[] or {"type":"...", "schema":...}[] |
| Map response | links: string[] |
links: [{url, title?, description?}[]} |
| Crawl/batch status | flat | paginated with next cursor |
| Document shape | engine-internal | V2Document with metadata.proxyUsed, cacheState, creditsUsed, scrapeId |
| Crawl status strings | varies | "scraping" | "completed" | "failed" |
scrapeOptions in crawl |
not present | nested object accepted |
| File parsing | not present | POST /v2/parse multipart |
| Batch scrape | not present | POST /v2/batch/scrape |
Authentication
Authorization: Bearer YOUR_API_KEY
- Hosted API (
https://api.fastcrw.com): always required. - Self-hosted: only required when
auth.api_keysis configured. /healthis always public.
POST /v2/scrape
Scrape one URL synchronously. Returns immediately with a V2Document.
Request body
| Field | Type | Default | Description |
|---|---|---|---|
url |
string |
required | Target page URL |
formats |
(string | FormatObject)[] |
["markdown"] |
Output formats — see Formats |
onlyMainContent |
boolean |
true |
Strip nav, footer, sidebar |
includeTags |
string[] |
[] |
Restrict to these HTML tags |
excludeTags |
string[] |
[] |
Remove these HTML tags |
waitFor |
number |
— | Milliseconds to wait after JS load |
headers |
Record<string, string> |
{} |
Custom HTTP headers forwarded to the target |
location |
{ country?: string, languages?: string[] } |
— | Proxy egress country (2-letter ISO) and Accept-Language hint |
proxy |
string |
"auto" |
"auto" or "stealth" (residential Chrome tier) |
proxyList |
string[] |
[] |
BYOP proxy URLs (rotated per proxyRotation) |
proxyRotation |
"round_robin" | "random" |
— | Rotation strategy for proxyList |
timeout |
number |
server default | Request deadline in milliseconds |
renderer |
"auto" | "lightpanda" | "chrome" | "chrome_proxy" | "playwright" |
— | Pin a renderer tier |
parsers |
ParserSpec[] |
— | Document parser directives (e.g. ["pdf"]) |
llmApiKey |
string |
— | BYOK LLM API key (required for summary / json if no server key) |
llmProvider |
"anthropic" | "openai" | "deepseek" | "azure" | "openai-compatible" |
— | BYOK LLM provider |
llmModel |
string |
— | BYOK LLM model name |
summaryPrompt |
string |
— | Custom prompt for the summary format |
Response
{
"success": true,
"data": { "...V2Document..." },
"warning": "optional — formats not yet supported by this engine"
}
See V2Document Response Shape below.
Example
curl -X POST https://api.fastcrw.com/v2/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"formats": ["markdown", "links"]
}'
Structured extraction with an object format:
curl -X POST https://api.fastcrw.com/v2/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"formats": [
{ "type": "json", "schema": { "type": "object", "properties": { "title": { "type": "string" } } } }
]
}'
TypeScript SDK
import { CrwClient } from "crw-sdk";
const crw = new CrwClient({ apiKey: process.env.CRW_API_KEY });
// Simple markdown scrape
const result = await crw.scrape("https://example.com", {
formats: ["markdown"],
});
console.log(result.markdown);
// Structured extraction
const data = await crw.scrape("https://example.com/product", {
formats: ["json"],
jsonSchema: {
type: "object",
properties: { title: { type: "string" }, price: { type: "string" } },
},
});
Note: the TypeScript SDK's scrape() method calls /v1/scrape under the hood; use batchScrape() or direct fetch to reach the v2 routes explicitly. The parseFile() method calls POST /v2/parse.
GET /v2/scrape/{job_id}
Always returns HTTP 404. CRW scrape is synchronous — there is no deferred job to poll. The endpoint exists only for SDK compatibility; use POST /v2/scrape and read the response directly.
POST /v2/crawl
Start an asynchronous recursive crawl. Returns a job ID immediately; poll GET /v2/crawl/{id} until status is "completed" or "failed".
Request body
| Field | Type | Default | Description |
|---|---|---|---|
url |
string |
required | Seed URL |
limit |
number |
— | Maximum pages to crawl |
maxDiscoveryDepth |
number |
— | Maximum link-follow depth from the seed |
scrapeOptions |
object |
— | Per-page scrape settings (formats, onlyMainContent, waitFor) |
renderer |
RequestedRenderer |
— | Pin a renderer tier for all pages |
country |
string |
— | 2-letter ISO country for proxy egress |
proxyList |
string[] |
[] |
BYOP proxy pool |
proxyRotation |
"round_robin" | "random" |
— | Rotation strategy for proxyList |
Start response
{
"success": true,
"id": "550e8400-e29b-41d4-a716-446655440000",
"url": "https://api.fastcrw.com/v2/crawl/550e8400-e29b-41d4-a716-446655440000"
}
Example
# Start the crawl
curl -X POST https://api.fastcrw.com/v2/crawl \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"limit": 50,
"maxDiscoveryDepth": 2,
"scrapeOptions": { "formats": ["markdown"], "onlyMainContent": true }
}'
GET /v2/crawl/{id}
Poll crawl status and retrieve completed documents. Supports cursor-based pagination for large crawls.
Query parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
skip |
number |
0 |
Zero-based document offset |
limit |
number |
100 |
Maximum documents per page (soft 10 MB byte cap also applies) |
Status response
{
"success": true,
"status": "scraping",
"total": 50,
"completed": 12,
"creditsUsed": 12,
"expiresAt": "2026-06-16T10:00:00.000Z",
"next": "https://api.fastcrw.com/v2/crawl/JOB_ID?skip=100",
"data": [ { "...V2Document..." } ]
}
status values: "scraping" (in progress) | "completed" | "failed".
next is null once the job is "completed" and there are no further pages. While the job is still running, next is always present so clients keep polling forward.
Polling pattern
# Poll until completed
JOB_ID="550e8400-e29b-41d4-a716-446655440000"
while true; do
RESP=$(curl -s https://api.fastcrw.com/v2/crawl/$JOB_ID \
-H "Authorization: Bearer YOUR_API_KEY")
STATUS=$(echo $RESP | jq -r '.status')
echo "Status: $STATUS, completed: $(echo $RESP | jq '.completed')"
if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ]; then break; fi
sleep 3
done
DELETE /v2/crawl/{id}
Cancel a running crawl. Returns an error if the job has already finished.
{ "success": true, "status": "cancelled", "message": "Crawl job <id> cancelled" }
GET /v2/crawl/active
List the IDs of all currently in-progress crawl jobs on this engine instance.
{ "success": true, "crawls": ["550e8400-...", "6ba7b810-..."] }
GET /v2/crawl/{id}/errors
Return per-URL errors accumulated during a crawl.
{
"success": true,
"errors": [
{ "id": "550e8400-...", "error": "fetch timeout for https://example.com/slow-page" }
],
"robotsBlocked": []
}
POST /v2/map
Discover URLs under a domain without scraping their content. The key v2 change from v1: links is an array of objects ({url, title?, description?}), not bare strings.
Request body
| Field | Type | Default | Description |
|---|---|---|---|
url |
string |
required | Base URL to map |
limit |
number |
— | Maximum links to return |
includePaths |
string[] |
[] |
Substring filters — only keep matching URLs |
excludePaths |
string[] |
[] |
Substring filters — remove matching URLs |
search |
string |
— | Substring search filter applied after discovery |
sitemap |
"include" | "only" | "skip" |
"include" |
Sitemap strategy |
maxDiscoveryDepth |
number |
server default | Link-follow depth |
timeout |
number |
120000 |
Milliseconds (capped at 300 000) |
Response
{
"success": true,
"links": [
{ "url": "https://example.com/about", "title": null, "description": null },
{ "url": "https://example.com/pricing", "title": null, "description": null }
]
}
Note: title and description are always null in the current engine version (they are reserved for future sitemap-sourced enrichment).
Example
curl -X POST https://api.fastcrw.com/v2/map \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"limit": 200,
"includePaths": ["/api", "/reference"]
}'
POST /v2/search
Web search returning grouped results (web, news, images). Reuses the same engine as /v1/search with a different response envelope.
Request body
Accepts the same fields as POST /v1/search. scrapeOptions.formats may be objects (v2 style) or strings (v1 style) — the engine normalizes them automatically.
Response
{
"success": true,
"data": {
"web": [
{
"url": "https://example.com/article",
"title": "Article Title",
"description": "Search snippet",
"position": 1
}
],
"news": null,
"images": null
},
"creditsUsed": 0,
"id": "f47ac10b-58cc-4372-a567-0e02b2c3d479"
}
web, news, images are omitted when the source did not return results for that category.
Example
curl -X POST https://api.fastcrw.com/v2/search \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "fastCRW web scraper", "limit": 5}'
POST /v2/parse
Upload a PDF file and receive its content as markdown, plain text, or structured JSON. Accepts multipart/form-data.
- Maximum upload size: 50 MB
- Only PDF is supported. Non-PDF uploads receive HTTP 400.
Multipart fields
| Field | Type | Required | Description |
|---|---|---|---|
file |
binary | yes | PDF file bytes |
options |
JSON string | no | Serialized ParseOptions (see below) |
ParseOptions (JSON string in the options field)
| Field | Type | Default | Description |
|---|---|---|---|
formats |
(string | FormatObject)[] |
["markdown"] |
Output formats |
jsonSchema |
object |
— | JSON Schema for structured extraction |
parsers |
ParserSpec[] |
— | Parser directives |
summaryPrompt |
string |
— | Custom prompt for summary format |
maxContentChars |
number |
— | Truncate extracted text before LLM steps |
Response
Same { success, data, warning? } envelope as POST /v2/scrape. The data.metadata.sourceFilename field carries the original filename; data.metadata.numPages carries the page count.
{
"success": true,
"data": {
"markdown": "# Document Title\n\n...",
"metadata": {
"sourceURL": "upload://report.pdf",
"url": "upload://report.pdf",
"statusCode": 200,
"proxyUsed": "basic",
"cacheState": "miss",
"concurrencyLimited": false,
"creditsUsed": 1,
"scrapeId": "a3bb189e-8bf9-3888-9912-ace4e6543002",
"numPages": 12,
"sourceFilename": "report.pdf"
}
}
}
Example
curl -X POST https://api.fastcrw.com/v2/parse \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@report.pdf" \
-F 'options={"formats":["markdown"]}'
TypeScript SDK
import { readFileSync } from "fs";
import { CrwClient } from "crw-sdk";
const crw = new CrwClient({ apiKey: process.env.CRW_API_KEY });
const pdf = readFileSync("report.pdf");
const result = await crw.parseFile(pdf, { filename: "report.pdf" });
console.log(result.markdown);
POST /v2/batch/scrape
Start an async job that scrapes a list of URLs with the same scrape options. The job uses the same crawl-job machinery as /v2/crawl; the status envelope is identical.
Request body
| Field | Type | Default | Description |
|---|---|---|---|
urls |
string[] |
required | URLs to scrape (at least 1) |
formats |
(string | FormatObject)[] |
["markdown"] |
Per-page output formats |
ignoreInvalidURLs |
boolean |
true |
Skip invalid URLs instead of failing |
| (any scrape option) | — | — | Other v2 scrape fields (onlyMainContent, waitFor, etc.) applied to every page |
Start response
{
"success": true,
"id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"url": "https://api.fastcrw.com/v2/batch/scrape/7c9e6679-...",
"invalidURLs": []
}
Example
# Start the batch
curl -X POST https://api.fastcrw.com/v2/batch/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com/page-1",
"https://example.com/page-2",
"https://example.com/page-3"
],
"formats": ["markdown"],
"onlyMainContent": true
}'
TypeScript SDK
import { CrwClient } from "crw-sdk";
const crw = new CrwClient({ apiKey: process.env.CRW_API_KEY });
const results = await crw.batchScrape(
["https://example.com/a", "https://example.com/b"],
{ formats: ["markdown"] }
);
// results is V2Document[]
GET /v2/batch/scrape/{id}
Poll batch status. Identical envelope to GET /v2/crawl/{id} — same skip/limit query parameters and same next cursor pattern.
DELETE /v2/batch/scrape/{id}
Cancel a running batch. Delegates to the same handler as DELETE /v2/crawl/{id}.
GET /v2/batch/scrape/{id}/errors
Fetch per-URL errors. Same response shape as GET /v2/crawl/{id}/errors.
Batch start → poll → collect pattern
POST /v2/batch/scrape → { id }
GET /v2/batch/scrape/{id} → { status, data[], next }
GET /v2/batch/scrape/{id}?skip=100 → { status, data[], next }
... (follow `next` until null and status = "completed")
The SDK's batchScrape() handles this loop automatically. When driving it manually:
- Start the job, record
id. - Poll
GET /v2/batch/scrape/{id}every 2–5 seconds. - Accumulate
data[]from each page. - If
nextis non-null, follow it; ifnextis null andstatusis"completed", stop. - On
status: "failed", inspect/errors.
POST /v2/extract (DEPRECATED)
Deprecated. Use
POST /v2/scrapewithformats: [{"type": "json", "schema": {...}}]instead./v2/extractwill remain available but the engine emits awarningsentry on every response advising the replacement.
Async multi-URL LLM extraction. Starts a job that scrapes each URL with formats: ["json"] and the supplied schema, then merges per-URL JSON objects.
Request body
| Field | Type | Default | Description |
|---|---|---|---|
urls |
string[] |
required | URLs to extract from |
prompt |
string |
— | Free-text extraction instruction (no schema required) |
schema |
object |
— | JSON Schema for structured output |
systemPrompt |
string |
— | System-level instruction prepended to the extraction prompt |
Start response
{
"success": true,
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"urlTrace": [],
"warnings": ["/v2/extract is deprecated. Use /v2/scrape with formats including a 'json' format object."],
"replacement": "/v2/scrape"
}
TypeScript SDK
import { CrwClient } from "crw-sdk";
const crw = new CrwClient({ apiKey: process.env.CRW_API_KEY });
// Deprecated path — prefer batchScrape with a json format object
const result = await crw.extract({
urls: ["https://example.com/product"],
schema: { type: "object", properties: { title: { type: "string" } } },
});
GET /v2/extract/{id} (DEPRECATED)
Poll an extract job.
{
"success": true,
"status": "completed",
"data": { "title": "Example Product" },
"expiresAt": "2026-06-16T10:00:00.000Z",
"creditsUsed": 1,
"tokensUsed": 412
}
status values: "processing" (in progress) | "completed" | "failed".
V2Document response shape
Every v2 scrape and batch/crawl document follows this shape. Fields are omitted (not null) when not requested.
{
"markdown": "# Page Title\n\n...",
"html": "<h1>Page Title</h1>...",
"rawHtml": "<!doctype html>...",
"links": ["https://example.com/about"],
"json": { "title": "Page Title" },
"summary": "A one-paragraph summary...",
"changeTracking": { "...ChangeTrackingResult..." },
"warning": "optional per-document warning",
"metadata": {
"title": "Page Title",
"description": "Page meta description",
"language": "en",
"sourceURL": "https://example.com",
"url": "https://example.com",
"statusCode": 200,
"contentType": "text/html; charset=utf-8",
"proxyUsed": "basic",
"cacheState": "miss",
"concurrencyLimited": false,
"creditsUsed": 1,
"scrapeId": "a3bb189e-8bf9-3888-9912-ace4e6543002"
}
}
Note on
numPagesandsourceFilename: These fields carryskip_serializing_if = "Option::is_none"in the engine source, so they are omitted entirely from web-scrape responses — they never appear asnull. They only appear in/v2/parsePDF upload responses, and only when the value is actually known.
V2Document vs v1 ScrapeData
| Field | v1 | v2 |
|---|---|---|
metadata.proxyUsed |
not present | "basic" or "stealth" |
metadata.cacheState |
not present | always "miss" (no cache yet) |
metadata.concurrencyLimited |
not present | always false |
metadata.creditsUsed |
not present | integer (≥ 1) |
metadata.scrapeId |
not present | per-document UUID |
metadata.numPages |
not present | page count for PDFs |
metadata.sourceFilename |
not present | filename for /v2/parse uploads |
links |
string[] flat |
string[] inside Document (Map uses objects) |
proxyUsed values
"basic"— default path (lightweight or chrome renderer)."stealth"— residential Chrome proxy tier, activated byproxy: "stealth".
cacheState
Always "miss" in the current engine. The field exists for Firecrawl SDK compatibility.
V2 formats
v2 formats accepts a mix of bare strings and typed objects.
Supported format strings
| Value | Description |
|---|---|
"markdown" |
Cleaned markdown (default) |
"html" |
Cleaned HTML |
"rawHtml" |
Raw full HTML |
"plainText" |
Plain text — note: the engine computes plain text internally, but V2Document has no plainText field; requesting this format currently produces no visible output (the value is silently dropped in the v2 serialization layer). |
"links" |
Array of discovered URLs |
"json" |
Structured JSON (requires schema or LLM key) |
"summary" |
LLM-generated summary |
"changeTracking" |
Diff against a previous snapshot |
Object format for JSON extraction
{
"type": "json",
"schema": {
"type": "object",
"properties": { "title": { "type": "string" } }
}
}
Object format for changeTracking
{
"type": "changeTracking",
"modes": ["gitDiff"],
"tag": "optional-snapshot-tag"
}
Unsupported formats (graceful warning)
The following formats are recognized but not yet produced by this engine. The request succeeds with the other requested formats; a warning field explains what was skipped:
screenshot, images, attributes, branding, audio, query.
V2CrawlStatus (shared by crawl and batch)
{
"success": true,
"status": "scraping",
"total": 50,
"completed": 12,
"creditsUsed": 12,
"expiresAt": "2026-06-16T10:00:00.000Z",
"next": "https://api.fastcrw.com/v2/crawl/JOB_ID?skip=100",
"data": [ { "...V2Document..." } ],
"error": null
}
| Field | Description |
|---|---|
status |
"scraping" | "completed" | "failed" |
total |
Estimated total pages (grows as new URLs are discovered) |
completed |
Pages fully scraped so far |
creditsUsed |
Sum of per-page credit costs |
expiresAt |
RFC3339 UTC timestamp when the job record expires |
next |
URL for the next result page; null when there are no more pages |
data |
This page's V2Documents (≤ limit documents, soft 10 MB cap) |
error |
Set on "failed" status |
SDK support
The TypeScript SDK (crw-sdk) uses v2 by default for the routes it covers:
| SDK method | Underlying route |
|---|---|
crw.parseFile() |
POST /v2/parse |
crw.batchScrape() |
POST /v2/batch/scrape + poll |
crw.extract() |
POST /v2/extract + poll (deprecated) |
Other methods (scrape, crawl, map, search) call v1 routes. Use direct fetch or pass apiUrl to target /v2/scrape explicitly.
The Python SDK exposes the same surface. Pass base_url="https://api.fastcrw.com" to the client constructor.
Common mistakes
Forgetting to poll. POST /v2/crawl and POST /v2/batch/scrape return a job ID, not results. The results live in GET .../{ id }.
Expecting next to be absent on the last page while the job is running. next is always emitted while status is "scraping", even when all buffered pages have been returned. Stop only when status is "completed" (or "failed") AND next is null.
Sending formats: ["screenshot"]. Screenshots are not yet produced. The request succeeds but warning explains the skip. Use formats: ["markdown"] instead.
Using /v2/extract. It works, but the engine warns you to use POST /v2/scrape with a json format object. Switch over to avoid future breakage.
Missing Content-Type: application/json. All JSON endpoints require it. /v2/parse is multipart — do not set Content-Type manually; let the HTTP client set it with the boundary.
What to read next
- Output Formats — full format reference
- Scrape — v1 scrape guide (same engine)
- Crawl — v1 crawl guide
- Extract — structured extraction patterns
- Map — URL discovery
- Search — web search
- Response Shapes — v1 shapes
- Error Codes