Gremlin Web Scraper MCP

GremlinScraper is a lightweight HTTP MCP module designed to scrape visible text from any publicly accessible webpage. It runs locally, integrates directly with VS Code’s MCP system, and speaks plain JSON.

This is Part 1 of the GremlinOS Runtime Suite from StatikFinTech LLC.

🧠 Features

MCP-Compatible: Shows up in VS Code’s MCP list with metadata.
Simple API: POST a URL, receive clean text in return.
CORS-Ready: Built-in CORS support for cross-origin requests.
Logging: Uses loguru to log all activity to rotating files.
Timeouts + Error Handling: Gracefully deals with slow or weird sites.
Human UA Header: Doesn’t look like a bot (unless you read the name).

🔧 Usage

Clone or drop this repo into your .vscode/mcps/ or wherever your MCPs live.
Add "gremlinScraper" to .mcp.json.
Click “Start Server” in the VS Code MCP tab.

Or run it manually:

pip install -r requirements.txt
python server.py

📦 Endpoints & Examples

1. POST /scrape

Fetch a single page’s visible text:

curl -X POST http://localhost:8742/scrape \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://example.com"}'

Response:

{
  "text": "Example Domain\n\nThis domain is for use in illustrative examples in documents.\n..."
}

2. POST /crawl

Recursively crawl same-domain links:

curl -X POST http://localhost:8742/crawl \
  -H 'Content-Type: application/json' \
  -d '{
    "url":"https://example.com",
    "max_pages":10,
    "max_depth":2,
    "concurrency":5
  }'

Response:

{
  "https://example.com": "Example Domain\n\nThis domain is for use…",
  "https://example.com/about": "About Us\n\n…",
  "...": "…"
}

3. POST /crawl-stream

Stream each page as soon as it’s fetched:

curl -N -X POST http://localhost:8742/crawl-stream \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://example.com","max_pages":5}'

Response (NDJSON):

{"url":"https://example.com","text":"Example Domain\n…"}
{"url":"https://example.com/link1","text":"Link One\n…"}
…

4. GET /ping

Health check endpoint:

curl http://localhost:8742/ping

Response:

pong

5. GET /mcp/metadata

MCP discovery metadata:

curl http://localhost:8742/mcp/metadata

Response:

{
  "name":"Gremlin Web Scraper MCP",
  "description":"Scrapes and crawls text from URLs via HTTP endpoints…",
  "version":"0.0.1",
  "author":"StatikFinTech LLC",
  "tags":["scraping","crawl","MCP","runtime"],
  "endpoints":[…]
}

🗂 Metadata

Name: Gremlin Web Scraper MCP
Author: StatikFinTech LLC
License: MIT
Tags: #scraping, #crawl, #runtime, #gremlin

🐾 Future Add-ons

PDF / EPUB / Markdown parsing
Selective DOM element filtering
Scheduling/recurring crawl and scrap jobs
Direct Memory injection to GremlinGPT core

“Split. Streamlined. Sovereign.” StatikFinTech Systems • 2025

[!CAUTION]

“Your qualifications are impressive...”

Coder Hiring Team (2025 Rejection Letter)

🔱 "This isn't rejection. It's proof they don't know how to build what comes next.

Still building what they can’t classify." 🔱 -StatikFinTech, LLC

Ascend Institute Traffic

The world’s first RS-RACS
Recursive, Self-Referential Autonomous Cognitive System

Reset: After 7:00pm CST on First 2 Clones

MCP Servers