MCP Servers

A collection of Model Context Protocol servers, templates, tools and more.

MCP server by babybank25

Created 6/1/2026
Updated about 17 hours ago
Repository documentation and setup instructions

web2mcp

URL -> installable CLI + MCP stdio adapter for Codex and Claude Code

Give it a website URL. It scans the site, detects its type, discovers APIs (OpenAPI, GraphQL, REST), and generates a ready-to-install adapter package with a CLI and a local MCP stdio server.

web2mcp create https://novelbin.me/ --ready

No AI key required. The default pipeline is 100% deterministic.


What you get

Every generated adapter includes:

Universal browser tools (9) - always present: browser_open, browser_get_state, browser_click, browser_fill, browser_press, browser_back, browser_wait, browser_extract, browser_screenshot

Management tools (4): adapter_info, adapter_doctor, list_available_actions, auth_status

Site-map tools (4): list_pages, find_page, open_page, refresh_site_map

API tools (when detected): api_search, api_list, gql_<query> (GraphQL)

Site-specific actions (heuristic): search, list_links, extract_cards, extract_table, extract_article, go_next_page, get_page, get_by_<param>, scroll_load_more, etc.


Quick start

git clone <repo>
cd web2mcp
npm install
npm run build

# Check environment
node dist/index.js doctor

# Create an adapter (no AI needed)
node dist/index.js create https://example.com --ready

The --ready flag runs the full cycle: scan -> generate -> install deps -> build -> doctor -> mcp:check -> print install commands.


Requirements

  • Node.js 20+
  • npm
  • Chromium (auto-installed via Playwright: npx playwright install chromium)
  • ThaiLLM API key (optional - AI is off by default)

Architecture

URL
 |
 +-> Scanner (Playwright)
 |     - SPA auto-detection (React/Vue/Angular/Next/Nuxt/Svelte)
 |     - Adaptive wait (skeleton detection, element count growth)
 |     - XHR/fetch capture with dedicated bucket (no cap)
 |     - apiWaitMs: extra wait for API calls to fire after page load
 |     - Shadow DOM + same-origin iframe extraction
 |     - Multi-page crawl (--max-pages)
 |     - Response body inspection (JSON schema extraction)
 |
 +-> API Discovery (multi-strategy)
 |     - OpenAPI/Swagger spec probe (12 common paths)
 |     - GraphQL introspection
 |     - networkHints XHR/fetch analysis
 |     - Direct path probing (/api/v1, /api/v2, /graphql, ...) when hints are empty
 |     - URL pattern extraction (/posts/{slug}, /users/{id})
 |     - Pagination query param detection (?page=N, ?cursor=xxx)
 |
 +-> Compact scan (token-efficient, importance-scored)
 |
 +-> Site type detection (scoring system, 83%+ accuracy)
 |     - search / docs / blog / ecommerce / directory / tool / dashboard / unknown
 |     - Wiki/encyclopedia detection
 |     - API site detection (URL + title signals)
 |
 +-> Action template library (17 templates, deterministic, no AI)
 |
 +-> Package generator (29 files)
 |     - src/runtime/ (browser session, state, tools, permission, trace,
 |                     overlay detector, site-map, auth profile, SPA detector)
 |     - src/cli.ts, src/mcp.ts, src/runner.ts
 |     - adapter.json, .mcp.json.example, codex.config.toml.example
 |
 +-> Smoke test (npm install, build, doctor, mcp:check, live get_page_text)
 +-> Quality scoring (auto-disable actions below 0.6 threshold)
 +-> Compatibility report (Excellent/Good/Partial/Poor + API endpoints)
 +-> Ready build (--ready: full install + build + doctor + mcp:check)

Configuration

Copy .env.example to .env (optional - only needed for AI features):

AI_BASE_URL=http://thaillm.or.th/api/v1
AI_API_KEY=your_key_here
AI_MODEL=typhoon

Or create web2mcp.config.json in your project root:

{
  "outputDir": "outputs",
  "scanner": {
    "maxPages": 2,
    "scroll": true,
    "waitMs": 2000,
    "apiWaitMs": 3000
  },
  "ai": { "enabled": false }
}

Config priority: CLI flags > project config > user config (~/.web2mcp/config.json) > defaults.


Parent CLI reference

| Command | Description | |---------|-------------| | create <url> | Full pipeline: scan -> plan -> generate -> smoke test | | create <url> --ready | + install/build/doctor/mcp:check + print install commands | | scan <url> | Scan only -> stdout JSON | | compact <scan> | Shrink scan-result.json for AI | | model <scan> | scan-result -> site-model.json | | inspect-site <url> | Detect site type + which templates apply | | doctor | Check Node, Playwright, AI config | | doctor-adapter <name> | Run adapter's npm run doctor | | install-codex <name> | Print/run codex mcp add (absolute path) | | install-claude <name> | Print/run claude mcp add + Desktop config | | install-both <name> | Both above | | repair <name> | Patch broken locators from latest trace | | upgrade <name> | Regenerate src/ from current templates (keep site-model) | | test-all | Regression test all adapters in outputs/ | | test-all --live | + run live get_page_text + api_search on each adapter | | benchmark | Run create pipeline on benchmarks/sites.json | | benchmark-tasks | Run real browser flows and measure success rates | | benchmark-detection | Measure site type detection accuracy (target: 80%+) | | metrics | Aggregate stats from all adapters | | pack <name> | Export adapter as zip (excludes secrets/node_modules) | | list | List adapters in outputs/ | | clean --stale | Remove adapters from old templates |

create flags

| Flag | Default | Description | |------|---------|-------------| | --ready | false | Full install/build/doctor cycle | | --no-ai | (default) | Heuristic planner only | | --ai-actions | false | Allow AI to design extra actions | | --spa | false | Force SPA mode (networkidle + longer waits) | | --api-wait-ms <n> | 2000 | Extra wait for API calls to fire after page load | | --max-pages <n> | 1 | Crawl N in-domain pages (max 10) | | --scroll | false | Scroll to trigger lazy content | | --scroll-times <n> | 3 | Viewport scrolls (cap 10) | | --wait-until <state> | domcontentloaded | Navigation wait state | | --wait-ms <n> | 1500 | Extra wait for SPA rendering | | --skip-test | false | Skip smoke test | | --headful | false | Visible browser | | --timeout <ms> | 30000 | Navigation timeout | | --json | false | Output JSON |


Generated adapter usage

cd outputs/<name>-adapter

# Build
npm run build

# Self-test (includes locator quality check)
npm run doctor

# List actions (enabled/disabled)
npm run cli -- actions
npm run cli -- actions --all

# Run site-specific actions
npm run cli -- run get_page_text --json
npm run cli -- run search --set query="hello" --json
npm run cli -- run api_search --set query="hello" --json   # API-first (no browser)
npm run cli -- run get_by_post_slug --set post_slug="my-post" --json

# Dry-run a write action (preview without executing)
npm run cli -- run submit_form --set name="test" --dry-run

# Universal browser tools
npm run cli -- browser open https://example.com/ --json
npm run cli -- browser state --json
npm run cli -- browser state --json --max-elements 20   # compact
npm run cli -- browser click <elementId>
npm run cli -- browser fill <elementId> "value" --confirm
npm run cli -- browser extract text --json
npm run cli -- browser extract links --json
npm run cli -- browser extract tables --json
npm run cli -- browser screenshot --out shot.png

# Site map
npm run cli -- pages refresh --max-pages 5
npm run cli -- pages list

# Auth (manual login)
npm run cli -- auth setup
npm run cli -- auth status
npm run cli -- auth clear

# Trace inspection
npm run cli -- trace list
npm run cli -- trace show last
npm run cli -- trace open last

# MCP check (real client test - spawns server + calls tools)
npm run mcp:check
npm run mcp:check --quick   # skip live browser_open

# Install scripts
npm run install:codex
npm run install:claude

Install with Codex

codex mcp add <name> -- node "/absolute/path/to/dist/mcp.js"
codex mcp list

Or edit ~/.codex/config.toml (template in codex.config.toml.example):

[mcp_servers.<name>]
command = "node"
args = ["/absolute/path/to/dist/mcp.js"]
startup_timeout_sec = 10
tool_timeout_sec = 60
enabled = true

Install with Claude Code

claude mcp add <name> --transport stdio -- node "/absolute/path/to/dist/mcp.js"
claude mcp list

Claude Desktop config in .mcp.json.example:

{
  "mcpServers": {
    "<name>": {
      "type": "stdio",
      "command": "node",
      "args": ["/absolute/path/to/dist/mcp.js"],
      "env": {}
    }
  }
}

API Discovery

web2mcp uses multiple strategies to discover APIs:

1. OpenAPI/Swagger: probes 12 common paths (/openapi.json, /swagger.json, /api-docs, etc.)

  • If found: generates actions from spec with confidence 0.95
  • Supports OpenAPI 3.x and Swagger 2.x

2. GraphQL: sends introspection query to detected /graphql endpoints

  • Generates gql_<query> actions using graphql_query step type (direct HTTP, no browser)

3. REST/JSON endpoints: analyzes XHR/fetch networkHints

  • Classifies as search/list/detail/pagination/graphql
  • Generates api_search / api_list actions using fetch_json step type

4. Direct path probing: when networkHints yield no API endpoints

  • Probes /api/v1, /api/v2, /api, /rest, /graphql, etc. via HEAD/GET
  • Detects JSON responses and classifies endpoints

5. URL patterns: detects /posts/{slug}, /users/{id} from link analysis

  • Generates get_by_<param>(id) actions

All API actions use direct HTTP (fetch_json / graphql_query step types) - no browser required, 10x faster.


Step types

| Step | Transport | Description | |------|-----------|-------------| | goto | Browser | Navigate to URL | | fill | Browser | Fill input element | | click | Browser | Click element | | press | Browser | Press key | | wait | Browser | Wait N ms | | select | Browser | Native <select> option | | select_option | Browser | Native + custom div dropdowns | | extract_text | Browser | Extract visible text | | extract_links | Browser | Extract all links | | extract_list | Browser | Extract list/card items | | screenshot | Browser | Capture PNG | | fetch_json | HTTP | Direct fetch() - no browser, 10x faster | | graphql_query | HTTP | Direct GraphQL - no browser |


Permission levels

| Level | Default | Requires | |-------|---------|---------| | read | allowed | nothing | | navigate | allowed | nothing | | write | blocked | confirm: true | | auth | blocked | confirm: true | | download | blocked | confirm: true | | dangerous | blocked | confirm: true + allowRisky: true |

CAPTCHA -> CAPTCHA_DETECTED (never bypassed) Rate limit -> RATE_LIMITED Access blocked -> ACCESS_BLOCKED


Resilience features

| Feature | Description | |---------|-------------| | SPA auto-detection | Detects React/Vue/Angular/Next/Nuxt/Svelte, adjusts wait strategy | | Adaptive wait | If element count < 5, waits 2s more for SPA rendering | | API wait | apiWaitMs extra wait for API calls to fire after page load | | Locator self-healing | Re-captures page state and matches by kind+text if locators fail | | Session crash recovery | Auto-restarts browser session on crash | | Network retry | Navigation retried 3x with exponential backoff (skips ERR_BLOCKED) | | Cookie/modal dismiss | Playwright click -> JS click -> Escape key fallback | | Context isolation | isolate: true option for fresh browser context per action | | Anti-bot detection | RATE_LIMITED + ACCESS_BLOCKED error codes | | Auto-repair hint | LOCATOR_FAILED errors include web2mcp repair command |


Site type detection

Accuracy: 83%+ on benchmark set (6 diverse sites).

| Type | Signals | |------|---------| | search | Search input + form | | docs | Documentation keywords, wiki/encyclopedia, many headings | | blog | Article/author/published signals, pagination | | ecommerce | Price patterns, rating/review, product cards | | directory | Many links, repeated cards, pagination | | tool | API URL patterns, HTTP/REST/JSON in title, forms without search | | dashboard | Login form | | unknown | Insufficient signals |

Run web2mcp benchmark-detection to measure accuracy on your benchmark set.


Trace system

Every tool call writes to traces/<traceId>/:

  • trace.json - tool, input, steps, result, duration
  • screenshot.png - viewport on error
  • dom.html - outer HTML on error (200kb cap)
npm run cli -- trace list
npm run cli -- trace show last
web2mcp repair <adapter>   # patch locators from latest trace

Compatibility report

Every create run generates compatibility-report.json + .md:

| Level | Score | |-------|-------| | Excellent | 85-100 | | Good | 70-84 | | Partial | 50-69 | | Poor | 0-49 |

Includes: reachability, content detection, locator quality, action count, API endpoints found.


Limitations

  • No CAPTCHA bypass
  • No purchase/payment/delete automation
  • Authentication requires manual login via auth setup
  • Multi-page crawl capped at 10 pages
  • API actions only work for public (unauthenticated) endpoints
  • GraphQL mutations not generated (read-only queries only)
  • MCP streaming output not yet supported (requires SDK upgrade)
  • YAML OpenAPI specs not yet parsed (JSON only)

Development

npm test              # 205 unit tests (TAP reporter, ASCII-safe)
npm run test:e2e      # 12 e2e MCP client tests
npm run lint
npm run format

Integration tests (require built adapter):

node tests/integration/permission-gate.mjs
node tests/integration/sitemap-live.mjs
node tests/integration/repair-full.mjs
node tests/integration/self-healing.mjs

License

MIT

Quick Setup
Installation guide for this server

Install Package (if required)

npx @modelcontextprotocol/server-web2mcp

Cursor configuration (mcp.json)

{ "mcpServers": { "babybank25-web2mcp": { "command": "npx", "args": [ "babybank25-web2mcp" ] } } }