MCP Servers

A collection of Model Context Protocol servers, templates, tools and more.

H
Hybrid Rag MCP

a rag mcp supporting for both ways of local model and api . base on https://github.com/shinpr/mcp-local-rag.

Created 12/6/2025
Updated 7 days ago
Repository documentation and setup instructions

(based on https://github.com/LynncX/hybrid-rag-mcp.git, assisted by claude code)

MCP Hybrid RAG

A privacy-first document search server with hybrid embedding and reranking support, optimized for embedded development. Run entirely locally or combine with cloud APIs for enhanced performance.

Built for the Model Context Protocol (MCP), this lets you use Cursor, Codex, Claude Code, or any MCP client to search through your documents using semantic search. Choose between local-only processing for maximum privacy or hybrid mode for better accuracy with API services.

🚀 New in v0.3.0: T5L/C51 Embedded Development Optimization

✨ Major Features

  • 🔧 Code-Aware Chunking: Intelligent splitting that preserves SFR registers, struct definitions, and macro declarations
  • 📚 Embedded Context Enhancement: Automatic file source injection and core type detection (C51 vs DGUS)
  • 🔄 Hybrid Embedding Engine: Choose between Transformers.js (Local) and Alibaba Cloud Qwen (API) via a unified interface
  • 🎯 Two-Stage Retrieval: Optional reranking with T5L-specific instructions for better accuracy
  • 🪟 Cross-Platform Core: Full Windows filesystem compatibility with normalized path handling
  • ⚡ Enhanced Search: Expanded candidate retrieval and symbol-aware ranking

🎯 Perfect for Embedded Development

Solves Key Problems for T5L/C51 Developers:

  • Preserves Register Definitions: sfr P0 = 0x80; stays intact in single chunks
  • Maintains Struct Integrity: Complete struct Touch_Command_0x05_KeyReturn definitions
  • Context-Rich Results: Search results include file sources, sections, and dependency hints
  • Symbol-Aware Search: Enhanced understanding of hex values like 0x5A, 0xA5, register addresses
  • Core Type Detection: Automatically distinguishes between C51 (OS) and DGUS (GUI) content

🏗️ Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│              MCP Hybrid RAG Server (v0.3.0)                 │
├─────────────────────────────────────────────────────────────┤
│  📚 Enhanced Document Ingestion                             │
│  ├─ 📄 PDF, DOCX, TXT, MD parsing                           │
│  ├─ 🔧 Code-aware chunking (T5L/C51 optimized)             │
│  ├─ 📊 Context injection (file source, core type)          │
│  ├─ 🔄 Hybrid embedding (Local + Qwen API)                  │
│  └─ 💾 LanceDB with enhanced metadata                       │
├─────────────────────────────────────────────────────────────┤
│  🔍 Advanced Search & Retrieval                             │
│  ├─ 🎯 Expanded candidate retrieval (60-120 results)       │
│  ├─ ⚡ T5L-specific reranking instructions                  │
│  ├─ 🔍 Symbol-aware search (hex values, registers)        │
│  └─ 📊 Context-enhanced results                             │
├─────────────────────────────────────────────────────────────┤
│  🛠️ MCP Tools (5 total)                                     │
│  ├─ ingest_file    │ query_documents  │ list_files          │
│  ├─ delete_file    │ status           │                     │
└─────────────────────────────────────────────────────────────┘

🔧 Code-Aware Features:
  • Preserves SFR register definitions: sfr P0 = 0x80;
  • Maintains struct integrity: Complete definitions in single chunks
  • File type detection: .h/.c/.md specific processing
  • Core type classification: C51 (OS) vs DGUS (GUI)
  • Dependency hints: Header file inclusion reminders

🎯 Key Features

  • 🔒 Privacy-First: Local processing keeps your data on your machine
  • 🌐 Hybrid Modes: Mix local and cloud services as needed
  • 🪟 Windows Support: Full compatibility with Windows paths and filesystems
  • ⚡ Reranking: Optional second-stage retrieval for improved relevance
  • 🔢 Dimension Safety: Automatic validation to prevent vector dimension mismatches
  • 📈 Performance: Sub-3 second queries even with thousands of documents

🚀 Quick Start

Choose the setup method that works best for you:

Method 1: Local Build (Recommended for Control)

Step 1: Clone and Build the Project

# Clone the repository
git clone https://github.com/shinpr/mcp-local-rag.git
cd mcp-local-rag

# Install dependencies and build
npm install
npm run build

Step 2: Configure Your AI Tool

For Cursor - Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "embedded-rag": {
      "command": "node",
      "args": ["C:/Users/YourName/Projects/mcp-local-rag/dist/index.js"],
      "env": {
        "BASE_DIR": "C:/Users/YourName/T5L-Project",
        "EMBEDDING_PROVIDER": "qwen",
        "DASHSCOPE_API_KEY": "your-qwen-api-key",
        "RERANKING_ENABLED": "true",
        "RERANKING_PROVIDER": "qwen"
      }
    }
  }
}

For Claude Code:

claude mcp add embedded-rag --scope user \
  --env BASE_DIR=/Users/yourname/T5L-Project \
  --env EMBEDDING_PROVIDER=qwen \
  --env DASHSCOPE_API_KEY=your-qwen-api-key \
  --env RERANKING_ENABLED=true \
  --env RERANKING_PROVIDER=qwen \
  -- node /Users/yourname/Projects/mcp-local-rag/dist/index.js

For General Documentation (Local-Only):

{
  "mcpServers": {
    "doc-rag": {
      "command": "node",
      "args": ["C:/Users/YourName/Projects/mcp-local-rag/dist/index.js"],
      "env": {
        "BASE_DIR": "C:/Users/YourName/Documents"
      }
    }
  }
}

🔍 Path Examples:

Windows:

"args": ["C:/Users/JohnDoe/Projects/mcp-local-rag/dist/index.js"],
"env": {
  "BASE_DIR": "C:/Users/JohnDoe/T5L-Embedded-Project"
}

macOS/Linux:

"args": ["/home/alice/projects/mcp-local-rag/dist/index.js"],
"env": {
  "BASE_DIR": "/home/alice/t5l-embedded-project"
}

Method 2: Direct NPM Install (Quick & Simple)

🤔 How NPM Install Works (No Git Clone Needed!)

NPM Registry Lookup:

npx -y mcp-local-rag
        ↓
1. NPM queries registry.npmjs.org for "mcp-local-rag"
2. Downloads the package (if not cached locally)
3. Executes it directly without permanent installation

What Actually Happens:

  • No Git Required: NPM downloads from npm registry, not GitHub
  • No Full Paths: NPM handles the exact file location automatically
  • Auto-Download: Downloads package if not in local cache
  • Version Control: Uses the published version from npm registry

Cache Locations:

  • Windows: C:\Users\YourName\AppData\Local\npm-cache\_npx
  • macOS/Linux: ~/.npm/_npx/

For Cursor - Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "embedded-rag": {
      "command": "npx",
      "args": ["-y", "mcp-local-rag"],
      "env": {
        "BASE_DIR": "/path/to/your/embedded-project",
        "EMBEDDING_PROVIDER": "qwen",
        "DASHSCOPE_API_KEY": "your-qwen-api-key",
        "RERANKING_ENABLED": "true",
        "RERANKING_PROVIDER": "qwen"
      }
    }
  }
}

For Claude Code:

claude mcp add embedded-rag --scope user \
  --env BASE_DIR=/path/to/your/embedded-project \
  --env EMBEDDING_PROVIDER=qwen \
  --env DASHSCOPE_API_KEY=your-qwen-api-key \
  --env RERANKING_ENABLED=true \
  --env RERANKING_PROVIDER=qwen \
  -- npx -y mcp-local-rag

📋 Which Method Should You Choose?

| Feature | Local Build | NPM Install | |---------|-------------|-------------| | Setup Speed | Slower (needs build) | ⚡ Instant | | Control | 🔧 Full control over code | Limited to published version | | Customization | ✅ Can modify source code | ❌ No modifications | | Offline Use | ✅ Works completely offline | ⚠️ Needs initial download | | Version Pinning | ✅ Can use specific commits | 📦 Limited to published versions | | Recommended For | Developers, production use | Quick testing, beginners |

🎯 First Use

Restart your AI tool (Cursor/Claude Code), then start using:

For T5L/C51 Embedded Development:

"Ingest all .h and .c files in the T5L project"
"Find UART register definitions"
"Show me Touch Command structure definitions"

For General Documentation:

"Ingest api-spec.pdf"
"What does this document say about authentication?"

🔧 Configuration Tips

Concrete Path Examples You Should Use:

Windows Example:

"args": ["C:/Users/JohnDoe/Projects/mcp-local-rag/dist/index.js"],
"env": {
  "BASE_DIR": "C:/Users/JohnDoe/Desktop/T5L-C51-Project"
}

macOS Example:

"args": ["/Users/sarah/dev/mcp-local-rag/dist/index.js"],
"env": {
  "BASE_DIR": "/Users/sarah/embedded-projects/t5l-development"
}

Linux Example:

"args": ["/home/alex/projects/mcp-local-rag/dist/index.js"],
"env": {
  "BASE_DIR": "/home/alex/t5l-workspace"
}

How to Find Your Paths:

# Find where you cloned the repo (run from the project folder)
pwd
# Output: /Users/yourname/Projects/mcp-local-rag

# Find your project folder
ls ~/Desktop/  # Look for your T5L project
# Or
ls ~/Documents/  # Check Documents folder

Get Qwen API Key:

  1. Visit Alibaba Cloud Console
  2. Sign up and get your API key (free tier available)
  3. Replace your-qwen-api-key in the configuration
  4. The key looks like: sk-xxxxxxxxxxxxxxxxxxxxxxxxxx

Common Path Mistakes to Avoid:

  • "args": ["./dist/index.js"] → Use absolute paths
  • "BASE_DIR": "./t5l" → Use absolute paths
  • "args": ["C:/Users/John/Projects/mcp-local-rag/dist/index.js"]
  • "BASE_DIR": "C:/Users/John/T5L-Project"

⚙️ Advanced Configuration Guide

🎯 Performance Tuning Parameters

For Best T5L/C51 Performance:

{
  "mcpServers": {
    "embedded-rag": {
      "command": "node",
      "args": ["C:/Users/YourName/Projects/mcp-local-rag/dist/index.js"],
      "env": {
        "BASE_DIR": "C:/Users/YourName/T5L-Project",

        // 🔥 Recommended for T5L/C51
        "EMBEDDING_PROVIDER": "qwen",
        "DASHSCOPE_API_KEY": "your-api-key",
        "EMBEDDING_MODEL_NAME": "text-embedding-v4",
        "EMBEDDING_DIMENSION": "1024",
        "EMBEDDING_BATCH_SIZE": "10",

        // 🎯 Critical for Symbol Recognition
        "RERANKING_ENABLED": "true",
        "RERANKING_PROVIDER": "qwen",
        "RERANKING_MODEL_NAME": "qwen3-rerank",
        "RERANKING_MAX_DOCUMENTS": "500",

        // ⚡ Optimized Chunk Settings
        "CHUNK_SIZE": "512",
        "CHUNK_OVERLAP": "100",

        // 💾 Database Storage
        "DB_PATH": "./lancedb"
      }
    }
  }
}

🔧 Parameter Optimization Guide

Embedding Configuration

| Parameter | Recommended Value | Impact | When to Adjust | |-----------|-------------------|---------|---------------| | EMBEDDING_PROVIDER | "qwen" | 🔥 Better symbol understanding | Use local if privacy is critical | | EMBEDDING_BATCH_SIZE | 10 | Faster processing | Increase to 16 for large projects | | EMBEDDING_DIMENSION | 1024 | Higher accuracy | Must match model capabilities | | CHUNK_SIZE | 512 | Balanced context | Increase to 1024 for long files | | CHUNK_OVERLAP | 100 | Context continuity | Increase to 200 for complex topics |

Reranking Configuration (Critical for T5L/C51)

| Parameter | Recommended Value | Why It Matters | |-----------|-------------------|----------------| | RERANKING_ENABLED | "true" | ✅ Essential for embedded dev | | RERANKING_PROVIDER | "qwen" | 🎯 Symbol-aware reranking | | RERANKING_MAX_DOCUMENTS | 500 | Expanded candidate pool | | RERANKING_INSTRUCT | Custom (see below) | T5L-specific optimization |

Custom Reranking Instructions:

"RERANKING_INSTRUCT": "You are T5L/C51 embedded development expert. Prioritize:\n1. Complete SFR register definitions (sfr P0 = 0x80)\n2. Full struct definitions (Touch_Command_0x05_KeyReturn)\n3. Macro definitions with hex values\n4. UART/GPIO/Timer related code blocks\n5. Distinguish C51 OS core vs DGUS GUI core content"

🎯 Use-Case Specific Configurations

High Accuracy T5L Development

{
  "EMBEDDING_PROVIDER": "qwen",
  "RERANKING_ENABLED": "true",
  "CHUNK_SIZE": "512",
  "EMBEDDING_BATCH_SIZE": "10",
  "RERANKING_MAX_DOCUMENTS": "500"
}

Privacy-Critical Development

{
  "EMBEDDING_PROVIDER": "local",
  "RERANKING_ENABLED": "false",
  "CHUNK_SIZE": "1024",
  "CHUNK_OVERLAP": "200"
}

Large Documentation Projects

{
  "EMBEDDING_PROVIDER": "qwen",
  "RERANKING_ENABLED": "true",
  "CHUNK_SIZE": "1024",
  "CHUNK_OVERLAP": "300",
  "EMBEDDING_BATCH_SIZE": "16"
}

🔍 Search Performance Optimization

Candidate Retrieval Settings

The system automatically expands search based on your settings:

  • Without Reranking: Retrieves limit × 3 candidates
  • With Reranking: Retrieves limit × 6 candidates (up to 120)

Why This Matters for T5L:

  • Broader search catches register names that might score low initially
  • Symbol recognition improved by larger candidate pool
  • Context preservation ensures complete definitions

Example Search Results Improvement:

# Query: "UART baud rate configuration"

# Before (5 candidates):
# Might miss specific register definitions

# After (30 candidates with reranking):
# ✓ Finds SFR registers, struct definitions, configuration examples
# ✓ Preserves complete UART initialization code
# ✓ Includes dependency hints for header files

🚨 Common Configuration Pitfalls

❌ Don't Do This:

{
  "EMBEDDING_PROVIDER": "qwen",
  "EMBEDDING_MODEL_NAME": "Xenova/all-MiniLM-L6-v2", // Wrong model for Qwen
  "RERANKING_ENABLED": "true",
  "DASHSCOPE_API_KEY": "" // Empty API key
}

✅ Do This Instead:

{
  "EMBEDDING_PROVIDER": "qwen",
  "EMBEDDING_MODEL_NAME": "text-embedding-v4", // Correct Qwen model
  "RERANKING_ENABLED": "true",
  "DASHSCOPE_API_KEY": "sk-your-actual-api-key" // Valid API key
}

🔧 Environment Variable Reference

Essential Variables:

  • BASE_DIR: Required - Your project root
  • DASHSCOPE_API_KEY: Required for Qwen - Get from Alibaba Cloud Console

Performance Variables:

  • CHUNK_SIZE: Text chunk size (default: 512)
  • CHUNK_OVERLAP: Context overlap (default: 100)
  • EMBEDDING_BATCH_SIZE: API batch size (default: 10)

Advanced Variables:

  • DB_PATH: Custom database location
  • MAX_FILE_SIZE: Maximum file size limit (default: 100MB)
  • RERANKING_MAX_DOCUMENTS: Max docs for reranking (default: 500)

🧪 Testing Your Configuration

Verify Setup:

# Test with sample T5L code
"Ingest T5LOS8051.h"
"Find UART register definitions"
"Show Touch Command structure definitions"

Expected Results:

Register definitions intact: sfr SCON = 0x98; in single chunk ✅ Complete structures: Full struct Touch_Command_0x05_KeyReturnRich context: [Source: T5LOS8051.h] [Type: .h] [Core: C51]Dependency hints: "[Note: Include this header file]"

⚙️ Configuration

Environment Variables

Core Settings

| Variable | Default | Description | Valid Range | |----------|---------|-------------|-------------| | BASE_DIR | Current directory | Document root directory. Server only accesses files within this path | Any valid path | | DB_PATH | ./lancedb/ | Vector database storage location | Any valid path | | MAX_FILE_SIZE | 104857600 (100MB) | Maximum file size | 1MB - 500MB | | CHUNK_SIZE | 512 | Characters per chunk | 128 - 2048 | | CHUNK_OVERLAP | 100 | Overlap between chunks | 0 - (CHUNK_SIZE/2) |

🔄 Hybrid Embedding Settings

| Variable | Default | Description | Options | |----------|---------|-------------|---------| | EMBEDDING_PROVIDER | local | Embedding provider | local, qwen | | EMBEDDING_MODEL_NAME | Xenova/all-MiniLM-L6-v2 | Local model name | HF model ID | | DASHSCOPE_API_KEY | - | Qwen API key | Required for qwen | | EMBEDDING_DIMENSION | - | Vector dimension for API models | Auto-detected | | EMBEDDING_BATCH_SIZE | 8 | Batch size for processing | 1-32 |

⚡ Reranking Settings

| Variable | Default | Description | Options | |----------|---------|-------------|---------| | RERANKING_ENABLED | false | Enable reranking | true, false | | RERANKING_PROVIDER | none | Reranker provider | local, qwen, none | | RERANKING_MODEL_NAME | Xenova/bge-reranker-base | Local reranker model | HF model ID | | RERANKING_MAX_DOCUMENTS | 500 | Max docs per rerank request | 1-1000 |

Legacy Compatibility

| Variable | Description | Migration | |----------|-------------|-----------| | MODEL_NAME | Legacy local model name | Auto-migrates to EMBEDDING_PROVIDER=local | | CACHE_DIR | Model cache directory | Used for local models |

Configuration Examples

🔧 T5L/C51 Embedded Development (Recommended)

BASE_DIR="/path/to/embedded/project"
EMBEDDING_PROVIDER="qwen"
DASHSCOPE_API_KEY="your-api-key"
EMBEDDING_MODEL_NAME="text-embedding-v4"
RERANKING_ENABLED="true"
RERANKING_PROVIDER="qwen"
CHUNK_SIZE="512"
CHUNK_OVERLAP="100"

🏠 General Documentation (Local-Only, Privacy First)

BASE_DIR="/home/user/documents"
EMBEDDING_PROVIDER="local"
EMBEDDING_MODEL_NAME="Xenova/all-MiniLM-L6-v2"
RERANKING_ENABLED="false"

🌐 Hybrid with Qwen API (General Use)

BASE_DIR="/home/user/documents"
EMBEDDING_PROVIDER="qwen"
DASHSCOPE_API_KEY="your-api-key"
EMBEDDING_MODEL_NAME="text-embedding-v4"
RERANKING_ENABLED="true"
RERANKING_PROVIDER="qwen"

⚡ High Performance Local (No API Costs)

BASE_DIR="/home/user/documents"
EMBEDDING_PROVIDER="local"
EMBEDDING_MODEL_NAME="Xenova/all-MiniLM-L6-v2"
RERANKING_ENABLED="true"
RERANKING_PROVIDER="local"
CHUNK_SIZE="1024"

💻 Usage

🎯 Embedded Development Usage

For T5L/C51 Projects:

"Ingest T5LOS8051.h header file"
"Ingest all .h and .c files in the embedded project"
"Ingest DGUS configuration documentation"

Search for register definitions:

"Find UART SFR register definitions"
"Search for timer register addresses"
"What are the GPIO port addresses in T5L?"

Search for data structures:

"Find Touch Command structure definitions"
"Show me all struct definitions related to UART"
"Get display command structures"

Search for configuration examples:

"Find UART baud rate configuration examples"
"How to initialize Timer1 for 9600 baud"
"GPIO setup for input/output mode"

📄 General Document Usage

Ingest a single document:

"Ingest the document at /Users/me/docs/api-spec.pdf"

Batch ingest multiple documents:

"Ingest all PDF files in the /projects/docs directory"

The server:

  1. Validates file exists and is within size limits
  2. Extracts text (PDF/DOCX/TXT/MD formats)
  3. Applies code-aware chunking for embedded files
  4. Generates embeddings using configured provider
  5. Stores with enhanced metadata (file types, core detection)

💾 Database Storage: Where Your Data Goes

When you ingest files, the RAG server creates a local vector database using LanceDB. Here's what happens:

Default Storage Location:

./lancedb/
├── chunks.lance          # Main database file (contains vectors + metadata)
└── _versions/           # Database version history
    ├── 0/
    ├── 1/
    └── ...

What Gets Stored:

  • Document Chunks: Text fragments with enhanced metadata
  • Vector Embeddings: Numerical representations for semantic search
  • File Metadata: Sources, types, core classifications (C51/DGUS)
  • Search Index: Optimized for fast similarity searches

Example Structure After Ingesting T5L Files:

./lancedb/
├── chunks.lance          # ~50MB for typical embedded project
│   ├── Document chunks from T5LOS8051.h
│   ├── Register definitions (sfr P0 = 0x80;)
│   ├── Struct definitions (Touch_Command_0x05_KeyReturn)
│   └── Metadata: [Source: T5LOS8051.h] [Type: .h] [Core: C51]
└── _versions/
    └── 0/                # Initial database state

Custom Database Location:

# Set custom database path
DB_PATH="/path/to/my/vector/db"

# In your MCP configuration:
"env": {
  "BASE_DIR": "/path/to/documents",
  "DB_PATH": "/path/to/my/vector/db"
}

Database Properties:

  • File Format: Apache Arrow-based Lance format (columnar storage)
  • Size: Typically 10-100MB for embedded projects
  • Performance: Sub-second searches even with thousands of chunks
  • Portability: Database files can be copied between machines
  • Privacy: Everything stored locally (no cloud storage)

🔍 Enhanced Search Features

Basic semantic search:

"What does the API documentation say about authentication?"
"Find information about rate limiting"
"Search for error handling best practices"

Symbol-aware search (enhanced for embedded):

"Find all uses of 0x5A in the codebase"
"Search for register 0x98 definitions"
"Get all macro definitions with hex values"

Search with context results:

"Find UART initialization code"

Result format (enhanced for embedded):

[Source: T5LOS8051.h]
[Type: .h]
[Core: C51]

sfr SCON = 0x98;
sfr SBUF = 0x99;

[Note: Include this header file when using these definitions]

File Management

List all ingested files:

"List all ingested files"

Delete a file from the database:

"Delete /Users/me/docs/old-spec.pdf from the RAG system"

Check system status:

"Show the RAG server status"

🗃️ Database Management

View Database Contents:

# Check what's in your vector database
ls -la ./lancedb/

# Database size information
du -sh ./lancedb/

Reset Database (Fresh Start):

# Option 1: Delete database directory (complete reset)
rm -rf ./lancedb/

# Option 2: Use MCP tool to remove specific files
"Delete /path/to/file.h from the RAG system"

Backup and Restore:

# Backup your database
cp -r ./lancedb/ ./lancedb-backup-$(date +%Y%m%d)/

# Restore from backup
rm -rf ./lancedb/
cp -r ./lancedb-backup-20240101/ ./lancedb/

Database Migration (Portable Setup):

# Move database to shared location
mv ./lancedb/ /shared/embedded-rag-db/

# Update your MCP configuration to point to new location
"env": {
  "BASE_DIR": "/path/to/documents",
  "DB_PATH": "/shared/embedded-rag-db"
}

🏗️ Project Structure

src/
├── index.ts                    # Entry point, MCP server initialization
├── server/                     # MCP server and tool handlers
│   └── index.ts               # RAGServer class with 5 MCP tools
├── interfaces/                 # Type definitions for modularity
│   ├── IEmbedder.ts           # Embedding interface
│   ├── IReranker.ts           # Reranker interface
│   └── IVectorStore.ts        # Vector store interface
├── services/                   # Concrete implementations
│   ├── embedder/              # Embedding providers
│   │   ├── LocalEmbedder.ts   # Transformers.js local embedding
│   │   ├── QwenEmbedder.ts    # Alibaba Cloud Qwen API
│   │   └── EmbedderFactory.ts # Factory pattern
│   ├── reranker/              # Reranking providers
│   │   ├── LocalReranker.ts   # Local cross-encoder reranking
│   │   ├── QwenReranker.ts    # Qwen API reranking
│   │   └── RerankerFactory.ts # Factory pattern
│   └── path/                  # Cross-platform utilities
│       └── PathUtils.ts       # Windows compatibility
├── parser/                     # Document parsing
│   └── index.ts               # PDF/DOCX/TXT/MD support
├── chunker/                    # Text splitting
│   └── index.ts               # Intelligent chunking
├── vectordb/                   # Vector database operations
│   └── index.ts               # LanceDB integration
├── types/                      # Configuration types
│   └── config.ts              # Config types and migration
├── errors/                     # Unified error hierarchy
│   └── index.ts               # RAGError, ValidationError, etc.
└── __tests__/                  # Test suites

🔧 Development

Building from Source

git clone https://github.com/shinpr/mcp-local-rag.git
cd mcp-local-rag
npm install

Code Quality

# Type check
npm run type-check

# Lint and format
npm run check:fix

# Full quality check
npm run check:all

Testing

# Run all tests
npm test

# Run with coverage
npm run test:coverage

# Watch mode for development
npm run test:watch

📊 Performance

Test Environment: MacBook Pro M1 (16GB RAM), Node.js 22 (January 2025)

Local Embedding (all-MiniLM-L6-v2)

  • Query Performance: 1.2 seconds for 10,000 chunks
  • Ingestion Speed: ~45 seconds for 10MB PDF
  • Memory Usage: 200MB baseline, 800MB peak
  • Dimensions: 384-dimensional vectors

Qwen API Embedding (text-embedding-v4)

  • Query Performance: 0.8 seconds + API latency
  • Ingestion Speed: ~25 seconds for 10MB PDF (faster processing)
  • Memory Usage: 150MB baseline (local processing only)
  • Dimensions: 1024-dimensional vectors (higher accuracy)

Hybrid with Reranking

  • Query Performance: 1.5-2.0 seconds total (search + rerank)
  • Accuracy Improvement: 15-25% better relevance scores
  • API Costs: Minimal for typical usage patterns

🛡️ Security

Path Restriction: Only accesses files within BASE_DIR. Path traversal attempts are blocked using PathUtils.isSafePath().

Local Processing: All local embedding processing happens on your machine. No network requests after initial model downloads.

API Security: Qwen API keys are passed via environment variables only. API connections use HTTPS with timeout protections.

Dimension Validation: Automatic validation prevents vector dimension mismatches between different embedding models.

🔍 Troubleshooting

Common Issues

"No results found" when searching

  • Documents must be ingested first: "Ingest /path/to/document.pdf"
  • Verify ingestion: "List all ingested files"

"Dimension mismatch" error

  • Occurs when switching between embedding providers with different vector dimensions
  • Solution: Delete existing database and re-ingest documents
  • Location: Delete your DB_PATH directory (default: ./lancedb)

"API key required for Qwen" error

"Model download failed"

  • Check internet connection for first use
  • Ensure sufficient disk space (~120MB needed)
  • Verify HuggingFace model URL is accessible

MCP Client Issues

For Cursor:

  1. Settings → Features → Model Context Protocol
  2. Verify server configuration is saved
  3. Restart Cursor completely (Cmd+Q)
  4. Check MCP connection status

For Claude Code:

claude mcp list  # Verify server appears
claude mcp remove local-rag  # Remove if needed
claude mcp add local-rag --scope user --env BASE_DIR=/path/to/docs -- npx -y mcp-local-rag

🤝 Contributing

Contributions are welcome! Before submitting a PR:

  1. Run tests: npm test
  2. Code quality: npm run check:all
  3. Add tests for new features
  4. Update documentation if changing behavior

📄 License

MIT License - see LICENSE file for details.

Free for personal and commercial use. No attribution required, but appreciated.

🙏 Acknowledgments

Built with:

Created as a practical tool for developers who want AI-powered document search without compromising privacy.

🔬 Embedded Development: What's Different?

Before v0.3.0 (Generic RAG)

❌ Register definition split across chunks:
   "sfr P0 =" in one chunk, "0x80;" in another

❌ Structure definitions broken:
   struct definitions cut in middle, losing complete context

❌ No file context:
   Results don't show which file the code came from

❌ Generic search:
   "0x5A" treated as regular text, not as instruction code

After v0.3.0 (Embedded-Aware RAG)

✅ Atomic register preservation:
   "sfr P0 = 0x80;" kept in single chunk

✅ Complete struct definitions:
   Full Touch_Command_0x05_KeyReturn structure preserved

✅ Rich context injection:
   [Source: T5LOS8051.h] [Type: .h] [Core: C51]

✅ Symbol-aware search:
   Special handling for hex values, register addresses

Real-World Impact

For T5L/C51 Developers:

  • Faster Development: Quickly find exact register definitions without guessing
  • Reduced Errors: Complete struct definitions prevent missing fields
  • Better Context: Know which file and core type each result comes from
  • Symbol Intelligence: Search for 0x98 finds UART SCON register definitions
  • Dependency Awareness: Automatic reminders to include necessary header files

📚 Further Reading

Quick Setup
Installation guide for this server

Install Package (if required)

npx @modelcontextprotocol/server-hybrid-rag-mcp

Cursor configuration (mcp.json)

{ "mcpServers": { "lynncx-hybrid-rag-mcp": { "command": "npx", "args": [ "lynncx-hybrid-rag-mcp" ] } } }