More API

Extract

Return structured JSON from known pages. For a single page, extraction is a scrape mode: use formats: ["json"] with a prompt and/or schema. For many pages, the native POST /v1/extract runs them as one async job.

Best for: schema-first output

Route: /v1/scrape (1 URL) · /v1/extract (many)

Start with: a tiny schema

View Scrape Try in Playground

Try it in the Playground

Verify the page before you verify the schema

A reliable extraction flow is simple: confirm the page scrapes cleanly, then add the smallest possible schema, then expand only if the page really contains the extra fields you want.

Open Playground Open Quick Start

Extracting structured data with CRW

/v1/scrape

POST /v1/scrape

Authentication:

Hosted: send Authorization: Bearer YOUR_API_KEY
Self-hosted: only required when auth.api_keys is configured

Installation

Extraction uses the same HTTP route as scrape, so you can use the same client code and only change the payload.

Basic usage

Start with this request:

{
  "url": "https://example.com/product/123",
  "formats": ["json"],
  "jsonSchema": {
    "type": "object",
    "properties": {
      "title": { "type": "string" },
      "price": { "type": "string" },
      "availability": { "type": "string" }
    },
    "required": ["title"]
  }
}

import requests

resp = requests.post(
    "https://api.fastcrw.com/v1/scrape",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "url": "https://example.com/product/123",
        "formats": ["json"],
        "jsonSchema": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "price": {"type": "string"},
                "availability": {"type": "string"},
            },
            "required": ["title"],
        },
    },
)

print(resp.json()["data"]["json"])

const resp = await fetch("https://api.fastcrw.com/v1/scrape", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://example.com/product/123",
    formats: ["json"],
    jsonSchema: {
      type: "object",
      properties: {
        title: { type: "string" },
        price: { type: "string" },
        availability: { type: "string" }
      },
      required: ["title"]
    }
  })
});

const body = await resp.json();
console.log(body.data.json);

curl -X POST https://api.fastcrw.com/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url":"https://example.com/product/123",
    "formats":["json"],
    "jsonSchema":{
      "type":"object",
      "properties":{
        "title":{"type":"string"},
        "price":{"type":"string"},
        "availability":{"type":"string"}
      },
      "required":["title"]
    }
  }'

Response

{
  "success": true,
  "data": {
    "json": {
      "title": "Widget",
      "price": "$19.99",
      "availability": "In stock"
    },
    "metadata": {
      "sourceURL": "https://example.com/product/123",
      "statusCode": 200,
      "elapsedMs": 846
    }
  }
}

Parameters

Field	Type	Default	Description
`url`	string	required	Target page URL
`formats`	string[]	required	Use `["json"]` for the canonical extraction path
`jsonSchema`	object	required	JSON schema describing the fields you want back
`extract`	object	--	Firecrawl-compatible wrapper; `extract.schema` is accepted
`llmApiKey`	string	--	Per-request LLM API key
`llmProvider`	string	server default	`anthropic`, `openai`, `deepseek`, `azure`, or `openai-compatible`
`llmModel`	string	server default	Extraction model override
`baseUrl`	string	--	OpenAI-compatible endpoint base, e.g. `https://api.deepseek.com/v1` (also used by Azure). crw appends `/chat/completions` automatically if you omit it.
`onlyMainContent`	boolean	`true`	Keep extraction focused on the main content block
`cssSelector`	string	--	Narrow the page before extraction
`xpath`	string	--	Narrow the page before extraction

Firecrawl-compatible aliases `formats: ["extract"]` and `formats: ["llm-extract"]` are accepted, but `["json"]` is the canonical CRW shape.

On self-hosted deployments, JSON extraction requires either `[extraction.llm]` in `config.toml` or a per-request `llmApiKey`. Without one of those, CRW returns an extraction error instead of guessed output.

Schema design

The easiest extraction schema is a small one:

ask for the fields you actually need,
keep field types simple on the first pass,
avoid optional fields until the required ones are stable.

Large schemas are harder to debug because you cannot tell whether the issue is the page, the schema, or both.

Extraction quality

Use this workflow:

scrape the page as markdown,
verify the target content is present,
add a minimal schema,
expand the schema only after the first JSON result looks right.

If the underlying page scrape is weak, the JSON extraction will also be weak.

Self-hosted LLM and provider control

Self-hosted: set [extraction.llm] in config.toml for a server-wide default, or pass llmApiKey + llmProvider + llmModel per request to override it. Supported providers: anthropic, openai, deepseek, azure, and openai-compatible. Use baseUrl for any OpenAI-compatible endpoint (DeepSeek, Azure, local models).

Common production patterns

Validate the target with markdown first, then add extraction.
Keep the schema as small as possible on the first pass.
Narrow the page with cssSelector only when the default extraction is too noisy.
Use a per-request llmApiKey only when you need per-request provider separation.

Common mistakes

Sending formats: ["json"] without a schema
Designing a schema that assumes more structure than the page really contains
Debugging extraction before confirming the underlying scrape succeeded
Using the async /v1/extract job for a single URL when a synchronous /v1/scrape with formats: ["json"] is simpler

When to use something else

Use Scrape when prose or markdown is enough
Use Map when you need discovery before extraction
Use Crawl when you need schema-based extraction across many pages

CRW — open-source web scraper & crawler (Firecrawl-compatible)

Quick: copy this and run

Python (stdlib only — defaults to hosted, fails loudly if key is missing)

cURL

CLI (one-shot LLM-ready output)

Response shape

Other endpoints

Install (local setup only)

Resources

Extract

Extracting structured data with CRW

/v1/scrape

Installation

Basic usage

Response

Parameters

Schema design

Extraction quality

Self-hosted LLM and provider control

Common production patterns

Common mistakes

When to use something else