Lucidient Research MCP v1.0

The Limitless Research Operating System - A universal backend where any agent, tool, or workflow can perform research at any scale without architectural changes.

Quick Start

# Start all services
docker compose up -d

# Verify health
curl http://localhost:3001/health

# Scale workers
docker compose up -d --scale worker=5

Architecture

Agent → FastMCP (port 3001) → Redis Queue → Worker Pool
                ↓                    ↓
            Postgres            MinIO (S3)
            (Metadata)          (Storage)
                ↓                    ↓
            ChromaDB            Ollama
            (Vectors)           (Embeddings)

Services

| Service | Port | Purpose | |---------|------|---------| | MCP Router | 3001 | FastMCP HTTP/SSE endpoint | | Postgres | 5433 | Job metadata, namespaces | | Redis | 6379 | Job queue, pub/sub, progress | | MinIO | 9000/9001 | Object storage (S3-compatible) | | Chroma | 8000 | Vector database | | Ollama | 11434 | Local embedding inference |

Submit a Crawl Job

curl -X POST http://localhost:3001/messages \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "start_crawl_job",
    "params": {
      "namespace": "my_research",
      "seeds": ["https://example.com"],
      "config": {
        "max_depth": 2,
        "max_pages": 10
      }
    },
    "id": 1
  }'

Query Results

# Check job status
curl -X POST http://localhost:3001/messages \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "job_status",
    "params": {"job_id": "YOUR_JOB_ID"},
    "id": 2
  }'

# Semantic search
curl -X POST http://localhost:3001/messages \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "query_index",
    "params": {
      "namespace": "my_research",
      "query": "What are the key findings?",
      "retrieval": {"top_k": 10}
    },
    "id": 3
  }'

Scale Workers

# Scale to 10 workers
docker compose up -d --scale worker=10

# Or edit docker-compose.yml deploy.replicas

API Reference

All 22 tools are exposed via FastMCP with HTTP/SSE transport:

Category A: Direct Fetch

fetch_page - Single-page fetch

Category B: Async Jobs

start_crawl_job - Queue large-scale crawl
job_status - Poll progress
job_results - Paginated results
cancel_job - Cancel running job
delete_job - Delete job + storage

Category C: Browser Automation

browser_session_create - Persistent Playwright context
browser_navigate - Navigate to URL
browser_interact - Click/type/scroll
browser_extract - Extract data
browser_screenshot - Capture screenshot
browser_close - Close session

Category D: Extraction

extract_structured - Structured extraction
extract_entities - Entity extraction
convert_document - PDF/DOCX ingestion

Category E: RAG

embed_job - Chunk + vectorize
query_index - Semantic search
get_chunk - Retrieve chunk
rag_query - Context window assembly

Category F: Management

list_jobs - List all jobs
get_storage_stats - Storage statistics
health_check - Service health

Configuration

All settings via environment variables:

| Variable | Default | Description | |----------|---------|-------------| | POSTGRES_URL | postgresql://postgres:postgres@postgres:5432/mcp | Database connection | | REDIS_URL | redis://redis:6379 | Redis connection | | MINIO_ENDPOINT | minio:9000 | MinIO endpoint | | CHROMA_HOST | chroma | ChromaDB host | | OLLAMA_HOST | http://ollama:11434 | Ollama endpoint | | DEFAULT_EMBED_MODEL | embed-gemma:300m | Embedding model |

Development

# Run tests (individual test files to avoid event loop issues)
cd mcp
python -m pytest tests/test_manage.py::test_health_check -v
python -m pytest tests/test_crawl.py::test_start_crawl_job -v
python -m pytest tests/test_fetch.py::test_fetch_page -v

# Import verification
python -c "from server import app; print('OK')"
cd ../worker && python -c "from worker import main; print('OK')"

Design Principles

No Artificial Caps - max_pages=0 means unlimited
No Data Through MCP - Responses contain only job_id and pointers
Async Everything - All operations are background jobs
Namespace Isolation - Every agent/project has its own storage prefix
Composable Primitives - Atomic operations: fetch, store, chunk, embed, query

License

Lucidient Systems - Proprietary

MCP Servers