Recipe: Scrape a JS-heavy SPA
Goal: Go from blank output to clean markdown on a React/Next.js/Vue SPA using the diagnostic loop: plain scrape → inspect renderedWith → add renderJs + waitFor → pin a renderer.
Target URL used in this recipe: https://vercel.com/changelog — a Next.js SPA that ships an empty <div id="__next"> shell to plain HTTP fetchers.
Step 1 — Plain scrape (baseline)
Always start without JS rendering. If the page is server-side rendered you save cost and latency.
curl -s -X POST https://api.fastcrw.com/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://vercel.com/changelog",
"formats": ["markdown"]
}'from crw import CrwClient
client = CrwClient() # reads CRW_API_KEY from env
result = client.scrape(
"https://vercel.com/changelog",
formats=["markdown"],
)
markdown = result.get("markdown", "")
rendered_with = result["metadata"]["renderedWith"]
print(f"renderedWith: {rendered_with}")
print(f"markdown length: {len(markdown)} chars")
print(markdown[:300] or "(empty)")Expected output (SPA shell — nothing useful):
{
"success": true,
"data": {
"markdown": "",
"metadata": {
"title": "Changelog – Vercel",
"sourceURL": "https://vercel.com/changelog",
"statusCode": 200,
"renderedWith": "http",
"elapsedMs": 180
}
}
}
renderedWith: "http" confirms no browser was used. The empty markdown is the SPA shell — all content is injected by JavaScript after load.
Step 2 — Add renderJs: true and a waitFor
Force browser rendering and give the page time to hydrate.
curl -s -X POST https://api.fastcrw.com/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://vercel.com/changelog",
"formats": ["markdown"],
"renderJs": true,
"waitFor": 2000
}'from crw import CrwClient
client = CrwClient()
result = client.scrape(
"https://vercel.com/changelog",
formats=["markdown"],
render_js=True,
wait_for=2000,
)
markdown = result.get("markdown", "")
rendered_with = result["metadata"]["renderedWith"]
print(f"renderedWith: {rendered_with}")
print(f"markdown length: {len(markdown)} chars")
print(markdown[:500])Expected output (hydrated content):
{
"success": true,
"data": {
"markdown": "# Changelog\n\n## June 2026\n\n### Build output compression...",
"metadata": {
"title": "Changelog – Vercel",
"sourceURL": "https://vercel.com/changelog",
"statusCode": 200,
"renderedWith": "lightpanda",
"elapsedMs": 2340
}
}
}
renderedWith is now "lightpanda" (or "chrome" depending on your server config) — the browser rendered the page and the markdown has real content.
Choosing a waitFor value
| Page type | Recommended waitFor |
|---|---|
| Light hydration (mostly SSR) | 500–1000 ms |
| Typical React/Next.js SPA | 2000 ms |
| Heavy client-side data fetch | 3000–5000 ms |
Start low and increase only when content is still missing. Long waits increase latency and cost; they do not fix bot-wall blocks.
Step 3 — Pin a renderer
Once you know the site works with a specific renderer, hard-pin it to skip the auto-detect chain. Valid values for renderer:
| Value | Renderer used |
|---|---|
"auto" (or omit) |
Auto-detect chain — lightpanda first, falls back to chrome |
"lightpanda" |
LightPanda — fastest, best for SPAs without anti-bot |
"chrome" |
Full headless Chrome — heavier, handles more complex JS |
"chrome_proxy" |
Headless Chrome via residential proxy — for geo-blocked or heavily protected sites |
"playwright" |
Playwright driver — use when CDP direct is unavailable |
Pinning a non-auto renderer implies renderJs: true. If you also set renderJs: false explicitly, the pin is ignored and HTTP-only is used.
curl -s -X POST https://api.fastcrw.com/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://vercel.com/changelog",
"formats": ["markdown"],
"renderer": "lightpanda",
"waitFor": 2000
}'curl -s -X POST https://api.fastcrw.com/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://vercel.com/changelog",
"formats": ["markdown"],
"renderer": "chrome",
"waitFor": 2000
}'from crw import CrwClient
client = CrwClient()
# Pin to lightpanda (fast, no fallback)
result = client.scrape(
"https://vercel.com/changelog",
formats=["markdown"],
renderer="lightpanda",
wait_for=2000,
)
rendered_with = result["metadata"]["renderedWith"]
print(f"renderedWith: {rendered_with}") # "lightpanda"
print(result["markdown"][:400])Expected response when pinning "chrome":
{
"success": true,
"data": {
"markdown": "# Changelog\n\n## June 2026\n\n### Build output compression...",
"metadata": {
"title": "Changelog – Vercel",
"sourceURL": "https://vercel.com/changelog",
"statusCode": 200,
"renderedWith": "chrome",
"elapsedMs": 3120
}
}
}
If the pinned renderer is not configured on your server, the API returns HTTP 400:
{
"success": false,
"error": "renderer 'chrome' not available; configured renderers: [lightpanda]. Update server config or omit the 'renderer' field.",
"error_code": "invalid_request"
}
Full diagnostic script (Python)
Paste and run this to walk through all three steps automatically:
import sys
from crw import CrwClient
TARGET = "https://vercel.com/changelog"
client = CrwClient() # reads CRW_API_KEY from env
def check(label: str, result: dict) -> None:
md = result.get("markdown", "")
rendered_with = result["metadata"].get("renderedWith", "?")
elapsed = result["metadata"].get("elapsedMs", 0)
print(f"\n{'='*60}")
print(f"[{label}] renderedWith={rendered_with} elapsed={elapsed}ms")
print(f" markdown length: {len(md)} chars")
if md:
print(f" preview: {md[:120].strip()!r}")
else:
print(" (empty — page needs JS rendering)")
# Step 1: plain HTTP
r1 = client.scrape(TARGET, formats=["markdown"])
check("Step 1 — plain HTTP", r1)
# Step 2: renderJs + waitFor
r2 = client.scrape(TARGET, formats=["markdown"], render_js=True, wait_for=2000)
check("Step 2 — renderJs=True, waitFor=2000", r2)
# Step 3: pin renderer explicitly
r3 = client.scrape(TARGET, formats=["markdown"], renderer="lightpanda", wait_for=2000)
check("Step 3 — renderer=lightpanda (pinned)", r3)
if len(r3.get("markdown", "")) > 200:
print("\nDone — content looks good with pinned renderer.")
else:
print("\nStill thin. Try renderer='chrome' or increase waitFor.", file=sys.stderr)
What each field controls
| Field | Wire name | Type | Effect |
|---|---|---|---|
render_js |
renderJs |
bool | null |
null = auto-detect, true = force browser, false = HTTP only |
renderer |
renderer |
string |
Pin to "auto", "lightpanda", "chrome", "chrome_proxy", or "playwright" |
wait_for |
waitFor |
integer (ms) |
Wait after load before extracting HTML |
metadata.renderedWith |
— | response field | Which renderer actually ran ("http", "lightpanda", "chrome", "chrome_proxy") |
Troubleshooting
renderedWith is "http" even with renderJs: true
No JS renderer is configured. Check your deployment or use the cloud endpoint (https://api.fastcrw.com).
Markdown is still empty after rendering
The page may be behind an anti-bot wall. Look for a warnings array in the response — it will name the vendor (Cloudflare, Akamai, etc.). Switch to renderer: "chrome_proxy" with a country code to route through a residential IP.
error_code: "invalid_request" when pinning
The pinned renderer is not running on your server. The error message lists what is available. Either configure the renderer in config.toml or switch to "auto".
Content is a loading spinner, not real data
Increase waitFor. Start at 2000, try 3000, then 5000. Beyond 5 seconds the page is likely blocked or requires authentication, not just slow.
elapsedMs is very high
Only pin a browser renderer when HTTP-only clearly fails. Plain HTTP fetches take ~50–300 ms; a browser fetch with waitFor: 2000 takes 2–5 s minimum.