A hybrid Python + JavaScript MCP runtime for scraping readable content from the web. Powered by Flask, BeautifulSoup, and the unshakable belief that you can automate everything. Built for VS Code’s Model Context Protocol. Gremlin-approved.
GremlinScraper is a lightweight HTTP MCP module designed to scrape visible text from any publicly accessible webpage. It runs locally, integrates directly with VS Code’s MCP system, and speaks plain JSON.
This is Part 1 of the GremlinOS Runtime Suite from StatikFinTech LLC.
🧠 Features
- MCP-Compatible: Shows up in VS Code’s MCP list with metadata.
- Simple API: POST a URL, receive clean text in return.
- CORS-Ready: Built-in CORS support for cross-origin requests.
- Logging: Uses
loguru
to log all activity to rotating files. - Timeouts + Error Handling: Gracefully deals with slow or weird sites.
- Human UA Header: Doesn’t look like a bot (unless you read the name).
🔧 Usage
- Clone or drop this repo into your
.vscode/mcps/
or wherever your MCPs live. - Add
"gremlinScraper"
to.mcp.json
. - Click “Start Server” in the VS Code MCP tab.
- Or run it manually:
pip install -r requirements.txt python server.py
📦 Endpoints & Examples
1. POST /scrape
- Fetch a single page’s visible text:
curl -X POST http://localhost:8742/scrape \
-H 'Content-Type: application/json' \
-d '{"url":"https://example.com"}'
- Response:
{
"text": "Example Domain\n\nThis domain is for use in illustrative examples in documents.\n..."
}
2. POST /crawl
- Recursively crawl same-domain links:
curl -X POST http://localhost:8742/crawl \
-H 'Content-Type: application/json' \
-d '{
"url":"https://example.com",
"max_pages":10,
"max_depth":2,
"concurrency":5
}'
- Response:
{
"https://example.com": "Example Domain\n\nThis domain is for use…",
"https://example.com/about": "About Us\n\n…",
"...": "…"
}
3. POST /crawl-stream
- Stream each page as soon as it’s fetched:
curl -N -X POST http://localhost:8742/crawl-stream \
-H 'Content-Type: application/json' \
-d '{"url":"https://example.com","max_pages":5}'
- Response (NDJSON):
{"url":"https://example.com","text":"Example Domain\n…"}
{"url":"https://example.com/link1","text":"Link One\n…"}
…
4. GET /ping
- Health check endpoint:
curl http://localhost:8742/ping
- Response:
pong
5. GET /mcp/metadata
- MCP discovery metadata:
curl http://localhost:8742/mcp/metadata
- Response:
{
"name":"Gremlin Web Scraper MCP",
"description":"Scrapes and crawls text from URLs via HTTP endpoints…",
"version":"0.0.1",
"author":"StatikFinTech LLC",
"tags":["scraping","crawl","MCP","runtime"],
"endpoints":[…]
}
🗂 Metadata
Name: Gremlin Web Scraper MCP
Author: StatikFinTech LLC
License: MIT
Tags: #scraping, #crawl, #runtime, #gremlin
🐾 Future Add-ons
- PDF / EPUB / Markdown parsing
- Selective DOM element filtering
- Scheduling/recurring crawl and scrap jobs
- Direct Memory injection to GremlinGPT core
“Split. Streamlined. Sovereign.” StatikFinTech Systems • 2025
[!CAUTION]
“Your qualifications are impressive...”
- Coder Hiring Team (2025 Rejection Letter)
🔱 "This isn't rejection. It's proof they don't know how to build what comes next.
Still building what they can’t classify." 🔱 -StatikFinTech, LLC
Ascend Institute Traffic
The world’s first RS-RACS
Recursive, Self-Referential Autonomous Cognitive System