MCP server that renders JavaScript SPA pages and extracts LLM-ready Markdown via Playwright + Readability
spa-reader-mcp
MCP server that renders JavaScript SPA pages and extracts LLM-ready Markdown.
Traditional web scrapers fail on Single Page Applications because content is rendered by JavaScript after page load. spa-reader-mcp solves this by launching a headless Chromium browser via Playwright, waiting for the page to fully render, then extracting clean Markdown using Mozilla's Readability and Turndown — ready for LLM consumption.
Features
spa_read— Render any SPA page and extract article content as clean Markdown with optional YAML frontmatterspa_screenshot— Capture full or viewport-sized PNG screenshots of rendered pages- Singleton browser — Reuses a single Chromium instance across requests for fast, low-overhead rendering
- SSRF protection — Blocks private/loopback IP ranges and restricts URL schemes to
http/https - Selector injection prevention — Rejects Playwright-specific selector syntax (
>>,nth=,text=,has-text,:has) - Content truncation — Caps output at 100KB with clean line-boundary truncation
Requirements
- Node.js >= 20
- Chromium browser for Playwright:
npx playwright install chromium
Installation
npx (recommended, zero install)
No global install needed — configure directly in your MCP client (see MCP Configuration).
Global install
npm install -g spa-reader-mcp
npx playwright install chromium
From source
git clone https://github.com/XXO47OXX/spa-reader-mcp.git
cd spa-reader-mcp
pnpm install
pnpm build
npx playwright install chromium
MCP Configuration
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"spa-reader": {
"command": "npx",
"args": ["-y", "spa-reader-mcp"]
}
}
}
Claude Code
claude mcp add spa-reader -- npx -y spa-reader-mcp
Tools
spa_read
Render a JavaScript SPA page and extract its content as LLM-ready Markdown.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| url | string | Yes | — | The URL of the SPA page to read |
| waitForSelector | string | No | — | CSS selector to wait for before extraction |
| waitTimeout | number | No | 30000 | Navigation timeout in ms (1000–120000) |
| includeMetadata | boolean | No | true | Include title/author/excerpt as YAML frontmatter |
Example output:
---
title: "Understanding React Server Components"
author: "Dan Abramov"
excerpt: "A deep dive into how RSC works under the hood"
source: "https://example.com/blog/rsc"
---
## Introduction
React Server Components allow you to...
spa_screenshot
Take a PNG screenshot of a JavaScript SPA page after rendering.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| url | string | Yes | — | The URL to screenshot |
| waitForSelector | string | No | — | CSS selector to wait for before capturing |
| waitTimeout | number | No | 30000 | Navigation timeout in ms (1000–120000) |
| width | number | No | 1280 | Viewport width in pixels (320–3840) |
| height | number | No | 720 | Viewport height in pixels (240–2160) |
| fullPage | boolean | No | false | Capture full scrollable page |
Returns the screenshot as a base64-encoded PNG image.
Architecture
URL
→ Playwright Chromium (headless, singleton)
→ Per-request BrowserContext (isolated cookies/storage)
→ Page navigation + networkidle + optional selector wait
→ Raw HTML
→ Mozilla Readability (article extraction)
→ Turndown (HTML → Markdown)
→ YAML frontmatter + truncation
→ LLM-ready Markdown
Key design decisions:
- Singleton browser: A single Chromium instance is launched on first request and reused. This avoids the ~2s cold-start penalty on subsequent calls.
- Per-request BrowserContext: Each request gets an isolated BrowserContext with its own cookies and storage, preventing cross-request data leakage.
- Readability fallback: If Mozilla Readability determines the page isn't article-like, the extractor falls back to converting the full
<body>HTML.
Security
| Protection | Details |
|------------|---------|
| SSRF prevention | Blocks localhost, 127.x.x.x, 10.x.x.x, 172.16-31.x.x, 192.168.x.x, ::1, fe80:, 169.254.x.x, 0.0.0.0 |
| Scheme whitelist | Only http: and https: URLs are allowed |
| Selector injection | Rejects Playwright engine syntax: >>, nth=, text=, has-text, :has() |
| Content truncation | Output capped at 100KB with clean line-boundary cut |
| Test bypass | Set SPA_READER_ALLOW_PRIVATE=1 to allow private IPs (for local development/testing only) |
Development
# Install dependencies
pnpm install
# Build
pnpm build
# Run tests
pnpm test
# Type check
pnpm lint
License
MIT — Copyright (c) 2026 XXO47OXX