spa-reader-mcp

MCP server that renders JavaScript SPA pages and extracts LLM-ready Markdown.

Traditional web scrapers fail on Single Page Applications because content is rendered by JavaScript after page load. spa-reader-mcp solves this by launching a headless Chromium browser via Playwright, waiting for the page to fully render, then extracting clean Markdown using Mozilla's Readability and Turndown — ready for LLM consumption.

Features

spa_read — Render any SPA page and extract article content as clean Markdown with optional YAML frontmatter
spa_screenshot — Capture full or viewport-sized PNG screenshots of rendered pages
Singleton browser — Reuses a single Chromium instance across requests for fast, low-overhead rendering
SSRF protection — Blocks private/loopback IP ranges and restricts URL schemes to http/https
Selector injection prevention — Rejects Playwright-specific selector syntax (>>, nth=, text=, has-text, :has)
Content truncation — Caps output at 100KB with clean line-boundary truncation

Requirements

Node.js >= 20
Chromium browser for Playwright:
```
npx playwright install chromium
```

Installation

npx (recommended, zero install)

No global install needed — configure directly in your MCP client (see MCP Configuration).

Global install

npm install -g spa-reader-mcp
npx playwright install chromium

From source

git clone https://github.com/XXO47OXX/spa-reader-mcp.git
cd spa-reader-mcp
pnpm install
pnpm build
npx playwright install chromium

MCP Configuration

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "spa-reader": {
      "command": "npx",
      "args": ["-y", "spa-reader-mcp"]
    }
  }
}

Claude Code

claude mcp add spa-reader -- npx -y spa-reader-mcp

Tools

`spa_read`

Render a JavaScript SPA page and extract its content as LLM-ready Markdown.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | url | string | Yes | — | The URL of the SPA page to read | | waitForSelector | string | No | — | CSS selector to wait for before extraction | | waitTimeout | number | No | 30000 | Navigation timeout in ms (1000–120000) | | includeMetadata | boolean | No | true | Include title/author/excerpt as YAML frontmatter |

Example output:

---
title: "Understanding React Server Components"
author: "Dan Abramov"
excerpt: "A deep dive into how RSC works under the hood"
source: "https://example.com/blog/rsc"
---

## Introduction

React Server Components allow you to...

`spa_screenshot`

Take a PNG screenshot of a JavaScript SPA page after rendering.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | url | string | Yes | — | The URL to screenshot | | waitForSelector | string | No | — | CSS selector to wait for before capturing | | waitTimeout | number | No | 30000 | Navigation timeout in ms (1000–120000) | | width | number | No | 1280 | Viewport width in pixels (320–3840) | | height | number | No | 720 | Viewport height in pixels (240–2160) | | fullPage | boolean | No | false | Capture full scrollable page |

Returns the screenshot as a base64-encoded PNG image.

Architecture

URL
 → Playwright Chromium (headless, singleton)
   → Per-request BrowserContext (isolated cookies/storage)
     → Page navigation + networkidle + optional selector wait
       → Raw HTML
         → Mozilla Readability (article extraction)
           → Turndown (HTML → Markdown)
             → YAML frontmatter + truncation
               → LLM-ready Markdown

Key design decisions:

Singleton browser: A single Chromium instance is launched on first request and reused. This avoids the ~2s cold-start penalty on subsequent calls.
Per-request BrowserContext: Each request gets an isolated BrowserContext with its own cookies and storage, preventing cross-request data leakage.
Readability fallback: If Mozilla Readability determines the page isn't article-like, the extractor falls back to converting the full <body> HTML.

Security

| Protection | Details | |------------|---------| | SSRF prevention | Blocks localhost, 127.x.x.x, 10.x.x.x, 172.16-31.x.x, 192.168.x.x, ::1, fe80:, 169.254.x.x, 0.0.0.0 | | Scheme whitelist | Only http: and https: URLs are allowed | | Selector injection | Rejects Playwright engine syntax: >>, nth=, text=, has-text, :has() | | Content truncation | Output capped at 100KB with clean line-boundary cut | | Test bypass | Set SPA_READER_ALLOW_PRIVATE=1 to allow private IPs (for local development/testing only) |

Development

# Install dependencies
pnpm install

# Build
pnpm build

# Run tests
pnpm test

# Type check
pnpm lint

MCP Servers