Privacy-first, standalone MCP server for web scraping and data extraction using local browser automation and your own LLM API key
Firecrawl Lite MCP Server
A privacy-first, standalone MCP server that provides web scraping and data extraction tools using local browser automation and your own LLM API key. No external dependencies or API keys required - completely decoupled from Firecrawl's cloud service.
� What Makes Firecrawl Lite Special
🔒 Privacy-First Architecture
- Local Processing - All web scraping and data extraction happens on your machine
- Your Data Stays Local - Content is processed locally, not sent to third parties
- No External Service Lock-in - Doesn't require Firecrawl's cloud API
- Complete Control - You own your data and infrastructure
💰 Cost-Effective & Transparent
- Pay Only for LLM Usage - No additional subscription or API fees
- Your LLM Provider - Compatible with OpenAI, xAI, Anthropic, Ollama, etc.
- Predictable Costs - Transparent pricing based on your chosen LLM rates
⚡ Performance & Simplicity
- Lightning-Fast Startup - Lightweight design means quick initialization
- Single Container - Simple deployment with Docker support
- Minimal Resource Usage - Optimized for efficiency and low memory footprint
📊 Feature Comparison
| Feature | Firecrawl Lite ✅ | Original Firecrawl ❌ | |---------|-------------------|----------------------| | 🏠 Deployment | Standalone/Local | Cloud Service | | 🔑 API Keys Required | Your LLM key only | Firecrawl API + LLM keys | | 🔒 Data Privacy | 100% local processing | Cloud processing | | 💰 Cost Model | LLM usage only | Subscription + LLM costs | | ⚙️ Setup Complexity | Single container | Multi-service deployment | | 📦 Bundle Size | ~50MB lightweight | Heavy multi-service | | 🏠 Local LLM Support | ✅ Ollama/Local LLMs | Limited local options | | 🎛️ Customization | Full control | Limited customization | | 🚀 Startup Time | < 5 seconds | Variable (cloud dependent) | | 🔧 Maintenance | Self-managed | Managed service |
�️ Available Tools
This standalone version provides local web scraping and data extraction using Puppeteer and your own LLM:
✅ scrape_page
- Extract content from a single webpage
- Implementation: Local browser automation with Puppeteer
- Use case: Get webpage content for LLMs to read
- Parameters:
url
,onlyMainContent
- Privacy: All data processed locally
✅ batch_scrape
- Scrape multiple URLs in a single request
- Implementation: Sequential local scraping with rate limiting
- Use case: Process multiple pages efficiently
- Parameters:
urls[]
,onlyMainContent
- Privacy: All data processed locally
✅ extract_data
- Extract structured data using LLM
- Implementation: Local scraping + your LLM for data extraction
- Use case: Pull specific data from pages using natural language prompts
- Parameters:
urls[]
,prompt
,enableWebSearch
- Privacy: Content scraped locally, sent to your LLM only
✅ extract_with_schema
- Extract data using JSON schema
- Implementation: Local scraping + schema-guided LLM extraction
- Use case: Extract structured data with predefined schema
- Parameters:
urls[]
,schema
,prompt
,enableWebSearch
- Privacy: Content scraped locally, sent to your LLM only
🚀 Quick Start
1. No Installation Required!
This MCP server runs via npx
- no global installation needed.
2. Configure your MCP client:
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json
:
{
"mcpServers": {
"firecrawl-lite": {
"command": "npx",
"args": ["-y", "@ariangibson/firecrawl-lite-mcp-server"],
"env": {
"LLM_API_KEY": "your_llm_api_key_here",
"LLM_PROVIDER_BASE_URL": "https://api.x.ai/v1",
"LLM_MODEL": "grok-code-fast-1"
}
}
}
}
LM Studio
If npx doesn't work with LM Studio, you can globally install first:
npm install -g @ariangibson/firecrawl-lite-mcp-server
Then use "command": "firecrawl-lite-mcp-server"
without npx.
Claude Code (CLI)
claude config mcp add firecrawl-lite \
--command "npx" \
--args "-y" --args "@ariangibson/firecrawl-lite-mcp-server" \
--env LLM_API_KEY=your_llm_api_key_here \
--env LLM_PROVIDER_BASE_URL=https://api.x.ai/v1 \
--env LLM_MODEL=grok-code-fast-1
3. Restart your MCP client and start scraping!
⚙️ Configuration Guide
Required Environment Variables
# Your LLM API key (xAI, OpenAI, Anthropic, etc.)
LLM_API_KEY=your_api_key_here
# LLM provider base URL
LLM_PROVIDER_BASE_URL=https://api.x.ai/v1
# LLM model name
LLM_MODEL=grok-code-fast-1
LLM Provider Examples
# xAI (Grok)
LLM_PROVIDER_BASE_URL=https://api.x.ai/v1
LLM_API_KEY=xai-your-key-here
LLM_MODEL=grok-code-fast-1
# OpenAI
LLM_PROVIDER_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=sk-your-key-here
LLM_MODEL=gpt-4o-mini
# Anthropic
LLM_PROVIDER_BASE_URL=https://api.anthropic.com
LLM_API_KEY=sk-ant-your-key-here
LLM_MODEL=claude-3-haiku-20240307
# Local LLM (Ollama)
LLM_PROVIDER_BASE_URL=http://localhost:11434/v1
LLM_API_KEY=your-local-key
LLM_MODEL=llama2
Optional Configuration
# Proxy configuration (for web scraping and LLM API calls)
PROXY_SERVER_URL=http://your-proxy.com:8080
PROXY_SERVER_USERNAME=your_proxy_username
PROXY_SERVER_PASSWORD=your_proxy_password
# Scraping configuration (anti-detection and rate limiting)
SCRAPE_USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
SCRAPE_VIEWPORT_WIDTH=1920
SCRAPE_VIEWPORT_HEIGHT=1080
SCRAPE_DELAY_MIN=1000
SCRAPE_DELAY_MAX=3000
🛡️ Anti-Detection Features
Firecrawl Lite includes sophisticated anti-detection measures to handle modern websites with bot protection:
✅ Built-in Anti-Detection
- Realistic Browser Fingerprinting: Spoofs navigator properties, plugins, and browser APIs
- Random Delays: Adds human-like delays between requests (configurable)
- Modern User Agent: Uses up-to-date Chrome user agent strings
- Viewport Simulation: Sets realistic desktop viewport sizes
- Headless Optimization: Configured for maximum stealth in headless mode
✅ Configurable Settings
# Control delays (in milliseconds)
SCRAPE_DELAY_MIN=1000 # Minimum delay before navigation
SCRAPE_DELAY_MAX=3000 # Maximum delay before navigation
SCRAPE_BATCH_DELAY_MIN=2000 # Minimum delay between batch requests
SCRAPE_BATCH_DELAY_MAX=5000 # Maximum delay between batch requests
🐳 Docker Deployment
Option 1: Pre-built Image (Recommended)
Perfect for production, Docker Swarm, and Kubernetes deployments:
# Pull and run the latest image
docker-compose up -d
# Or run directly with Docker
docker run -d \
-p 3000:3000 \
-e LLM_API_KEY=your_key_here \
-e LLM_PROVIDER_BASE_URL=https://api.x.ai/v1 \
-e LLM_MODEL=grok-code-fast-1 \
ariangibson/firecrawl-lite-mcp-server:latest
Option 2: Build from Source
For development or customization:
# Edit docker-compose.yml to uncomment the build section
# Then build and run
docker-compose up --build -d
Docker Hub Repository
Pre-built images are automatically published to: ariangibson/firecrawl-lite-mcp-server
- Multi-architecture support:
linux/amd64
,linux/arm64
- Automatic updates: Built on every release
- Tagged versions:
latest
,v1.1.2
, etc.
The server will be available at http://localhost:3000
with a health endpoint at http://localhost:3000/health
.
📊 Usage Examples
Scrape a webpage
{
"name": "scrape_page",
"arguments": {
"url": "https://example.com"
}
}
Batch scrape multiple URLs
{
"name": "batch_scrape",
"arguments": {
"urls": ["https://example.com", "https://example.org"],
"onlyMainContent": true
}
}
Extract data with prompt
{
"name": "extract_data",
"arguments": {
"urls": ["https://example.com"],
"prompt": "Extract the main article title and summary"
}
}
Extract with schema
{
"name": "extract_with_schema",
"arguments": {
"urls": ["https://example.com"],
"schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"description": {"type": "string"}
}
}
}
}
❓ Important Notes
🌐 Internet Requirements
- Requires Internet Access - Still needs to access target websites
- LLM API Access - Requires connection to your chosen LLM provider
- No Offline Operation - Cannot work completely offline
� Intentionally Excluded Features
By design, this lite version excludes advanced features to maintain simplicity:
- Web search functionality
- Website URL discovery/mapping
- Multi-page website crawling
- LLMs.txt file generation
- Advanced research capabilities
- Crawl job status checking
🙏 Credits & Acknowledgments
This project is inspired by and builds upon the excellent work of the original Firecrawl projects:
🔥 Firecrawl
The original Firecrawl project by Mendable.ai - a comprehensive web scraping and crawling platform with advanced features like website mapping, multi-page crawling, and deep research capabilities.
🔥 Firecrawl MCP Server
The official MCP server implementation by the Firecrawl team, providing MCP integration for their cloud-based scraping service.
We give huge thanks to the Firecrawl team for their pioneering work in web scraping and MCP integration! 🚀
💡 Looking for a very generous free tier and dead-simple cloud-hosted solution?
Visit firecrawl.com and sign up for a Firecrawl account! Their cloud service offers enterprise-grade web scraping with zero setup complexity.
�📝 License
MIT License - see LICENSE for details.