Agentic RAG — MCP Hybrid Intelligence System

A production-ready, zero-cost Agentic RAG system that answers queries by orchestrating between your private local files and live web data using the Model Context Protocol (MCP). Runs 100% locally — no OpenAI, no API keys, no monthly bills.

What it does

Routes your query to local files, the web, or both — automatically
Reads your Markdown and PDF files via Filesystem MCP
Queries a self-hosted SearXNG instance for live web results
Re-ranks all chunks locally using FlashRank cross-encoder
Synthesizes an answer using Ollama (Llama 3.1 8B)
Runs a Chain-of-Verification critique to catch hallucinations
Resolves conflicts between private and public data with arbitration rules

Architecture

User Query │ ▼ Intent Classifier (LangGraph) │ ├──────────────────────┐ ▼ ▼ Local MCP Stack Web MCP Stack (Filesystem + SQLite) (SearXNG) │ │ └──────────┬───────────┘ ▼ FlashRank Re-ranker │ ▼ Synthesis Node (Ollama LLM) │ ▼ Critique Node (Chain-of-Verification) │ ┌───────┴────────┐ ▼ ▼ CoVe Resolution Final Output (if conflict) (if verified) │ └──► re-synthesize (cyclic)

Tech stack

| Layer | Tool | License | |---|---|---| | Orchestration | LangGraph | Apache 2.0 | | LLM inference | Ollama + Llama 3.1 8B | MIT / Meta | | Embeddings | nomic-embed-text | Apache 2.0 | | Vector store | ChromaDB | Apache 2.0 | | Re-ranking | FlashRank | Apache 2.0 | | Local retrieval | Filesystem MCP + SQLite MCP | MIT | | Web retrieval | SearXNG + MCP bridge | AGPL | | Document parsing | PyMuPDF + Unstructured | AGPL / Apache |

Total recurring cost: $0

Prerequisites

Python 3.11+
Node.js 18+
Docker 24+
8 GB RAM minimum (16 GB recommended)
10 GB free disk space for models

Setup

See docs/setup.md for the full step-by-step guide.

Quick start after setup:

# Activate environment
source .venv/bin/activate

# Start services
ollama serve &
docker start searxng

# Ingest your documents
python ingest.py

# Run the agent
python agent.py

Configuration

Copy .env.example to .env and edit the paths:

cp .env.example .env

Edit mcp.json to point to your knowledge base directory.

Project structure

agentic-rag-mcp/ ├── README.md # this file ├── agent.py # main LangGraph agent ├── ingest.py # document ingestion into ChromaDB ├── mcp.json # MCP server configuration ├── requirements.txt # Python dependencies ├── .env.example # environment variable template └── docs/ └── setup.md # full step-by-step setup guide

License

MIT — use freely, modify, and build on it.

Here are the setup steps in order. Each step builds on the previous one.

Step 1 — Check system prerequisites

Open a terminal and verify you have the required runtimes:

python3 --version      # need 3.11+
node --version         # need 18+
docker --version       # need 24+
git --version

If anything is missing:

# Python 3.11 (Ubuntu/Debian)
sudo apt update && sudo apt install -y python3.11 python3.11-venv python3-pip

# Node 20 via nvm (any OS)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
source ~/.bashrc
nvm install 20

# Docker
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER   # then log out and back in

Step 2 — Install Ollama and pull models

Ollama runs your LLMs 100% locally.

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Start the Ollama daemon (runs on port 11434)
ollama serve &

# Pull the inference model (4.9 GB — grab a coffee)
ollama pull llama3.1:8b

# Pull the embedding model (274 MB)
ollama pull nomic-embed-text

# Verify both are loaded
ollama list

You should see llama3.1:8b and nomic-embed-text in the list. Test it quickly:

ollama run llama3.1:8b "Say hello in one sentence"

Step 3 — Set up SearXNG (self-hosted web search)

SearXNG is your zero-cost, privacy-first web search engine. You'll run it in Docker.

# Create a directory for SearXNG config
mkdir -p ~/searxng && cd ~/searxng

# Pull the official image
docker pull searxng/searxng

# Generate a secret key
openssl rand -hex 32  # copy this output

# Create the settings file
cat > settings.yml << 'EOF'
use_default_settings: true
server:
  secret_key: "PASTE_YOUR_KEY_HERE"
  limiter: false
  image_proxy: false
search:
  safe_search: 1
  default_lang: "en"
engines:
  - name: google
    engine: google
    disabled: false
  - name: duckduckgo
    engine: duckduckgo
    disabled: false
  - name: wikipedia
    engine: wikipedia
    disabled: false
EOF

Now start the container:

docker run -d \
  --name searxng \
  -p 8888:8080 \
  -v ~/searxng/settings.yml:/etc/searxng/settings.yml \
  --restart unless-stopped \
  searxng/searxng

Test it in your browser: http://localhost:8888 — you should see the SearXNG search UI. Also test via curl:

curl "http://localhost:8888/search?q=test&format=json" | python3 -m json.tool | head -30

Step 4 — Create the project and Python environment

# Create project directory
mkdir -p ~/agentic-rag && cd ~/agentic-rag

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Confirm you're inside the venv
which python   # should show ~/agentic-rag/.venv/bin/python

Step 5 — Install Python dependencies

pip install --upgrade pip

pip install \
  langgraph \
  langchain \
  langchain-ollama \
  langchain-chroma \
  langchain-community \
  chromadb \
  flashrank \
  mcp \
  pymupdf \
  unstructured \
  rank-bm25 \
  python-dotenv \
  requests \
  httpx \
  pydantic

This will take 2–4 minutes. Verify key packages:

python -c "import langgraph; import flashrank; import chromadb; print('All imports OK')"

Step 6 — Install the MCP servers

MCP servers are Node.js packages. Install them globally:

# Filesystem MCP server
npm install -g @modelcontextprotocol/server-filesystem

# SQLite MCP server (via uvx — install uv first if needed)
pip install uv
uvx mcp-server-sqlite --help   # confirms it's working

# SearXNG MCP bridge
pip install mcp-server-searxng

Test the filesystem MCP server manually:

echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}' | \
  npx @modelcontextprotocol/server-filesystem ~/Documents

If you see a JSON response with protocolVersion, the server is working.

Step 7 — Set up your knowledge base directory

# Create the folder structure
mkdir -p ~/knowledge-base/{documents,reports,notes}

# Put a test file in it
cat > ~/knowledge-base/documents/test-policy.md << 'EOF'
# Q3 Procurement Policy

## Approved Vendors
- All vendors must be ISO 9001 certified.
- Payments above $10,000 require dual approval.

## Timeline
- Q3 runs July 1 – September 30, 2025.
- Purchase orders must be submitted by September 15.
EOF

Create the SQLite database for structured data:

python3 - << 'EOF'
import sqlite3, os
db_path = os.path.expanduser("~/knowledge-base/knowledge.db")
conn = sqlite3.connect(db_path)
conn.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id INTEGER PRIMARY KEY,
        title TEXT,
        content TEXT,
        category TEXT,
        created_at TEXT
    )
""")
conn.execute("""
    INSERT INTO documents (title, content, category, created_at) VALUES
    ('Q3 Budget Cap', 'Total procurement budget for Q3 is capped at $500,000 USD.', 'finance', '2025-07-01')
""")
conn.commit()
conn.close()
print("SQLite DB created at", db_path)
EOF

Step 8 — Create the MCP configuration file

cd ~/agentic-rag

cat > mcp.json << 'EOF'
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/home/USER/knowledge-base"
      ],
      "description": "Local Markdown and PDF retrieval"
    },
    "sqlite": {
      "command": "uvx",
      "args": [
        "mcp-server-sqlite",
        "--db-path",
        "/home/USER/knowledge-base/knowledge.db"
      ],
      "description": "Structured data retrieval"
    },
    "searxng": {
      "command": "python",
      "args": ["-m", "mcp_server_searxng"],
      "env": {
        "SEARXNG_BASE_URL": "http://localhost:8888",
        "SEARXNG_MAX_RESULTS": "10"
      },
      "description": "Live web search via SearXNG"
    }
  }
}
EOF

# Replace USER with your actual username
sed -i "s/USER/$(whoami)/g" mcp.json

cat mcp.json   # verify paths look correct

Step 9 — Create the `.env` file

cat > .env << 'EOF'
OLLAMA_BASE_URL=http://localhost:11434
CHROMA_DB_PATH=./chroma_db
KNOWLEDGE_BASE_PATH=/home/USER/knowledge-base
SEARXNG_URL=http://localhost:8888
LOG_LEVEL=INFO
EOF

sed -i "s/USER/$(whoami)/g" .env

Step 10 — Ingest your documents into ChromaDB

Create and run the ingestion script:

cat > ingest.py << 'EOF'
import os, glob
from dotenv import load_dotenv
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import (
    UnstructuredMarkdownLoader, PyMuPDFLoader
)

load_dotenv()

KNOWLEDGE_BASE = os.getenv("KNOWLEDGE_BASE_PATH")
CHROMA_PATH    = os.getenv("CHROMA_DB_PATH", "./chroma_db")

embedder = OllamaEmbeddings(model="nomic-embed-text")
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)

docs = []

# Load Markdown files
for path in glob.glob(f"{KNOWLEDGE_BASE}/**/*.md", recursive=True):
    loader = UnstructuredMarkdownLoader(path)
    raw = loader.load()
    chunks = splitter.split_documents(raw)
    for c in chunks:
        c.metadata["source_type"] = "private_file"
        c.metadata["file_path"]   = path
    docs.extend(chunks)
    print(f"  Loaded {len(chunks)} chunks from {path}")

# Load PDF files
for path in glob.glob(f"{KNOWLEDGE_BASE}/**/*.pdf", recursive=True):
    loader = PyMuPDFLoader(path)
    raw = loader.load()
    chunks = splitter.split_documents(raw)
    for c in chunks:
        c.metadata["source_type"] = "private_file"
        c.metadata["file_path"]   = path
    docs.extend(chunks)
    print(f"  Loaded {len(chunks)} chunks from {path}")

print(f"\nTotal chunks to embed: {len(docs)}")
print("Embedding via nomic-embed-text (local)... this may take a moment")

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=embedder,
    persist_directory=CHROMA_PATH,
    collection_name="private_kb",
)

print(f"\nDone. {vectorstore._collection.count()} vectors stored in ChromaDB.")
EOF

python ingest.py

You should see output like 3 vectors stored in ChromaDB (more as you add real documents).

Step 11 — Save the agent script

# Copy the full agent code from the earlier response into agent.py
# Then do a quick sanity check on imports:

python -c "
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain_chroma import Chroma
from flashrank import Ranker, RerankRequest
from langgraph.graph import StateGraph, END
print('All agent imports OK')
"

Step 12 — Run a test query

python - << 'EOF'
from dotenv import load_dotenv
load_dotenv()

from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.1:8b")
response = llm.invoke("In one sentence, confirm you are running locally.")
print("LLM response:", response.content)

from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma

embedder = OllamaEmbeddings(model="nomic-embed-text")
vs = Chroma(
    collection_name="private_kb",
    embedding_function=embedder,
    persist_directory="./chroma_db"
)
results = vs.similarity_search("procurement policy", k=2)
print(f"\nChroma retrieval test: found {len(results)} chunks")
for r in results:
    print(" →", r.page_content[:80])

import requests
r = requests.get("http://localhost:8888/search?q=EU+supply+chain+2025&format=json")
print(f"\nSearXNG status: {r.status_code}, results: {len(r.json().get('results', []))}")
EOF

All three checks should pass — LLM responds, ChromaDB returns chunks, SearXNG returns web results.

Step 13 — Run the full agent

python agent.py

You should see the agent walk through classification → retrieval → re-ranking → synthesis → critique → output and print the final verified answer to your terminal.

Quick reference — what runs where

| Service | Port | How to start | How to check | |---|---|---|---| | Ollama | 11434 | ollama serve | curl localhost:11434 | | SearXNG | 8888 | docker start searxng | localhost:8888 in browser | | ChromaDB | embedded | auto (no daemon) | check ./chroma_db folder exists | | MCP servers | stdio | spawned by agent | runs automatically |

After a reboot, just run ollama serve & and docker start searxng — everything else starts on demand when you run python agent.py.

MCP Servers

Agentic RAG — MCP Hybrid Intelligence System

What it does

Architecture

Tech stack

Prerequisites

Setup

Configuration

Project structure

License

Step 1 — Check system prerequisites

Step 2 — Install Ollama and pull models

Step 3 — Set up SearXNG (self-hosted web search)

Step 4 — Create the project and Python environment

Step 5 — Install Python dependencies

Step 6 — Install the MCP servers

Step 7 — Set up your knowledge base directory

Step 8 — Create the MCP configuration file

Step 9 — Create the `.env` file

Step 10 — Ingest your documents into ChromaDB

Step 11 — Save the agent script

Step 12 — Run a test query

Step 13 — Run the full agent

Quick reference — what runs where

Installation Command (package not published)

Cursor configuration (mcp.json)

Agentic RAG — MCP Hybrid Intelligence System

What it does

Architecture

Tech stack

Prerequisites

Setup

Configuration

Project structure

License

Step 1 — Check system prerequisites

Step 2 — Install Ollama and pull models

Step 3 — Set up SearXNG (self-hosted web search)

Step 4 — Create the project and Python environment

Step 5 — Install Python dependencies

Step 6 — Install the MCP servers

Step 7 — Set up your knowledge base directory

Step 8 — Create the MCP configuration file

Step 9 — Create the .env file

Step 10 — Ingest your documents into ChromaDB

Step 11 — Save the agent script

Step 12 — Run a test query

Step 13 — Run the full agent

Quick reference — what runs where

Installation Command (package not published)

Cursor configuration (mcp.json)

Step 9 — Create the `.env` file