RAG-Powered MCP Server for Document Q&A

A production-ready Retrieval-Augmented Generation (RAG) pipeline exposed as a Model Context Protocol (MCP) server — enabling Claude Desktop to answer questions directly from your private document collection.

Built with FAISS vector search, Sentence Transformers, and Groq LLM.

Demo

Features

Retrieval-Augmented Generation (RAG) pipeline
FAISS vector similarity search
Sentence Transformer embeddings
Groq LLaMA inference for fast responses
Streamlit chat interface
Claude Desktop MCP integration
Source-grounded answers (no hallucination)
Adjustable retrieval depth

What This Does

You feed it PDFs → it indexes them semantically → Claude Desktop can query them in real-time via MCP tool calls.

Ask Claude "What does the Transformer paper say about multi-head attention?" and it retrieves the exact relevant passages from your documents and generates a grounded answer — no hallucination, no guessing.

Architecture

PDFs
  ↓  PyMuPDF extracts text
Text Chunks (500 words each)
  ↓  Sentence Transformers embeds
FAISS Index (local vector store)

--- At query time ---

User Question → Claude Desktop
  ↓  MCP tool call
server.py receives question
  ↓  embeds question
FAISS finds Top 5 similar chunks
  ↓  chunks stuffed into prompt
Groq LLaMA generates grounded answer
  ↓
Answer returned to Claude Desktop

Tech Stack

| Component | Technology | Purpose | |-----------|-----------|---------| | PDF parsing | PyMuPDF | Extract raw text from PDFs | | Embeddings | Sentence Transformers (all-MiniLM-L6-v2) | Convert text to semantic vectors | | Vector store | FAISS (Meta) | Fast similarity search | | LLM | Groq (LLaMA 3.3 70B) | Generate answers from retrieved chunks | | MCP server | FastMCP (Python SDK) | Expose RAG as Claude Desktop tool |

Streamlit Web Interface

The project also includes an interactive Streamlit chat interface that allows users to query their documents directly from a browser.

Features:

• Chat-style interface
• Source-grounded answers
• Adjustable retrieval depth
• Retrieved document chunks inspection

Screenshot

Streamlit Interface

Project Structure

rag-mcp-server/
│
├── documents/              ← Put your PDFs here
│   ├── 1706.03762v7.pdf    (Attention Is All You Need)
│   ├── 1810.04805v2.pdf    (BERT)
│   └── 2005.11401v4.pdf    (RAG paper by Meta)
│
├── assets/                 
│   ├── demo.png            (Claude Desktop answering from papers)
│   ├── mcp_running.png     (MCP server running in Claude settings)
│   └── architecture.png    
│
├── ingest.pynb               ← Run once: chunks + embeds PDFs → FAISS index
├── rag.py                  ← Core RAG logic: retrieve + generate
├── server.py               ← MCP server: exposes RAG as Claude tool
│
├── chunks.json             ← Auto-generated by ingest.py
├── faiss_index.bin         ← Auto-generated by ingest.py
│
├── requirements.txt
└── README.md
├── streamlit_app.py       ← Streamlit web interface

Setup

1. Clone the repo

git clone https://github.com/TanmayRawal/rag-mcp-server.git
cd rag-mcp-server

2. Create a Python 3.11 environment

conda create -n ragmcp python=3.11
conda activate ragmcp

3. Install dependencies

pip install -r requirements.txt

4. Get a free Groq API key

Go to console.groq.com
Sign up and create an API key
Add it to both rag.py and server.py:

os.environ["GROQ_API_KEY"] = "your_key_here"

5. Add PDFs to `documents/`

Place any PDFs in the documents/ folder. Suggested papers:

6. Run ingestion

python ingest.py

Reads all PDFs, splits into 500-word chunks, embeds with Sentence Transformers, saves FAISS index. Run once (or when you add new documents).

7. Test RAG directly

python rag.py

Connecting to Claude Desktop

MCP Running

1. Edit the MCP config

Windows: C:\Users\<you>\AppData\Roaming\Claude\claude_desktop_config.json
Mac: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "rag-knowledge-base": {
      "command": "/path/to/conda/envs/ragmcp/python",
      "args": ["/path/to/rag-mcp-server/server.py"]
    }
  }
}


### 2. Restart Claude Desktop

Fully quit and reopen. Go to **Settings → Developer** — you should see `rag-knowledge-base` with status **running**.

### 3. Ask Claude anything about your documents

What does the Transformer paper say about multi-head attention? What datasets were used to evaluate RAG? Explain the BERT pre-training objectives.


Claude automatically calls your `query_knowledge_base` tool and answers from your actual documents.

---

## How RAG Works

1. **Retrieve** — embed the question, search FAISS for top 5 semantically similar chunks
2. **Augment** — inject those chunks into the LLM prompt as context
3. **Generate** — LLM reads chunks and generates a grounded answer

No fine-tuning. No training. Just smart search + generation.

---
## Example Questions

Try asking questions like:

- What does the Transformer paper say about multi-head attention?
- What are the pre-training objectives used in BERT?
- How does Retrieval Augmented Generation work?
- What datasets were used to evaluate the RAG model?

---


## Requirements

pymupdf sentence-transformers faiss-cpu groq mcp


Python 3.11+ required.

---

## Future Improvements

- Support more formats (`.txt`, `.docx`, `.md`)
- Hybrid search (BM25 + semantic)
- Conversation memory across queries
- Streamlit web UI
- Multiple independent knowledge bases

---

## License

MIT

MCP Servers

RAG-Powered MCP Server for Document Q&A

Features

What This Does

Architecture

Tech Stack

Streamlit Web Interface

Screenshot

Project Structure

Setup

1. Clone the repo

2. Create a Python 3.11 environment

3. Install dependencies

4. Get a free Groq API key

5. Add PDFs to `documents/`

6. Run ingestion

7. Test RAG directly

Connecting to Claude Desktop

1. Edit the MCP config

Installation Command (package not published)

Cursor configuration (mcp.json)

RAG-Powered MCP Server for Document Q&A

Features

What This Does

Architecture

Tech Stack

Streamlit Web Interface

Screenshot

Project Structure

Setup

1. Clone the repo

2. Create a Python 3.11 environment

3. Install dependencies

4. Get a free Groq API key

5. Add PDFs to documents/

6. Run ingestion

7. Test RAG directly

Connecting to Claude Desktop

1. Edit the MCP config

Installation Command (package not published)

Cursor configuration (mcp.json)

5. Add PDFs to `documents/`