MCP Orch System - MCP Servers

MCP Orchestration System

AI Agent System with Multi-Modal RAG and MCP Servers

Company: Sepia Innovations Japan Developer: Shreyash Shankarrao Jadhav

1. What this project is

This repository contains a complete AI knowledge assistant built with a Model Context Protocol (MCP) architecture.

It lets you:

Chat with your own data (notes, PDFs, CSVs, Google Drive files).
Combine Database search + PDF RAG + Web search + Google Drive in a single agent.
Use a Smart Router that automatically decides which MCP server to call for each question.
Build a Neo4j knowledge graph of entities and relationships from your data.

It is designed to be:

Beginner friendly to run locally.
Detailed enough for reviewers to understand the full architecture and pipelines.

If you are new, start with:

Section 2 – Quick start
Section 3 – Project map (what lives where)

If you are reviewing the implementation, focus on:

Section 4 – MCP servers and clients
Section 5 – Data & search pipelines
Section 6 – Knowledge graph & Neo4j

2. Quick start (beginner friendly)

2.1. Prerequisites

Python: 3.9+
MySQL: 8.x (or compatible)
Neo4j (optional but recommended for graph features)
Node / npm: not required
Browsers: Chrome/Edge for UI and Neo4j browser

Cloud/API keys:

OpenAI API key (for GPT-4o-mini & entity extraction)
Google API key + CSE ID (for web search MCP)
Google OAuth credentials (for Google Drive MCP)

2.2. Clone the repo

git clone <your-github-repo-url>
cd "MCP Orch System"

On Windows, paths in this README assume something like: s:\Shreyash\Sepia ML intern\MCP Orch System

2.3. Create virtual environments and install deps

You can use one global venv for everything or separate ones per service. For simplicity, one venv:

cd "s:\Shreyash\Sepia ML intern\MCP Orch System"
python -m venv .venv
.venv\Scripts\activate

pip install -r mcp_database/requirements.txt
pip install -r mcp_rag_pdf/requirements.txt
pip install -r mcp_google_drive/requirements.txt
pip install -r mcp_google_search/requirements.txt
pip install -r ai_agent_system/requirements.txt

python -m spacy download en_core_web_sm

2.4. Environment variables

Create these .env files (values are examples; change as needed).

mcp_database/.env

MYSQL_HOST=localhost
MYSQL_USER=root
MYSQL_PASSWORD=your_mysql_password
MYSQL_DATABASE=mcp_database

CHROMA_DIR=./chroma_db

NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_neo4j_password

OPENAI_API_KEY=your_openai_key
SERVER_PORT=8003

mcp_rag_pdf/.env

MYSQL_HOST=localhost
MYSQL_USER=root
MYSQL_PASSWORD=your_mysql_password
MYSQL_DATABASE=mcp_rag_pdf

CHROMA_DIR=./chroma_db_pdf

NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_neo4j_password

OPENAI_API_KEY=your_openai_key
UPLOAD_DIR=./uploads
SERVER_PORT=8004

mcp_google_search/.env

GOOGLE_API_KEY=your_google_api_key
GOOGLE_CSE_ID=your_google_custom_search_engine_id
SERVER_PORT=8001

mcp_google_drive/.env

SERVER_PORT=8002
GOOGLE_DRIVE_TOKEN_PATH=token.json
GOOGLE_DRIVE_CREDENTIALS_PATH=credentials.json

Place your Google OAuth credentials.json in mcp_google_drive/ (this file is typically gitignored).

ai_agent_system/.env

OPENAI_API_KEY=your_openai_key
USE_SMART_ROUTER=true
SMART_ROUTER_OPENAI=true

2.5. Initialize databases (basic)

In MySQL:

CREATE DATABASE mcp_database;
CREATE DATABASE mcp_rag_pdf;

Some helper scripts exist (see mcp_database/reset_database.py) but the system will create many tables on first use.

2.6. Start all services

Use multiple terminals (or PowerShell windows).

Terminal 1 – Database MCP

cd "s:\Shreyash\Sepia ML intern\MCP Orch System\mcp_database"
python server.py

Terminal 2 – RAG PDF MCP

cd "s:\Shreyash\Sepia ML intern\MCP Orch System\mcp_rag_pdf"
python server.py

Terminal 3 – Google Drive MCP

cd "s:\Shreyash\Sepia ML intern\MCP Orch System\mcp_google_drive"
python server.py

Terminal 4 – Google Search MCP

cd "s:\Shreyash\Sepia ML intern\MCP Orch System\mcp_google_search"
python server.py

Terminal 5 – Main AI Agent app (Streamlit UI)

cd "s:\Shreyash\Sepia ML intern\MCP Orch System\ai_agent_system"
streamlit run app.py

Then open: http://localhost:8501

2.7. First test (simple)

Once the app is running:

Go to Create Note tab.
Create a small note (title + content).
Go to Chat tab.
Ask: “Find my note about <your topic>”.

If you see a relevant answer and your note in results, your Database MCP + agent are working.

For deeper testing, see TESTING_GUIDE.md.

3. Project map (for orientation)

High-level structure:

MCP Orch System/
├── ai_agent_system/         # Streamlit UI + central agent
├── mcp_database/           # Database MCP server (MySQL, ChromaDB, Neo4j, CSV)
├── mcp_rag_pdf/            # RAG PDF MCP server
├── mcp_google_drive/       # Google Drive MCP server
├── mcp_google_search/      # Google Search MCP server
├── mcp_clients/            # Python clients for all MCP servers
├── TESTING_GUIDE.md        # Detailed end-to-end testing steps
├── SMART_ROUTER_GUIDE.md   # Smart routing design and usage
└── (other helper scripts/docs)

3.1. `ai_agent_system/` – main app

app.py – Streamlit app, tabs, charts, and UI around the agent.
agent.py – Core agent orchestration, integrates all MCP clients, search strategies, and smart router.
router.py – High-level query routing logic and statistics.
config_manager.py – Configuration utilities.

3.2. `mcp_database/` – Database MCP

server.py – FastAPI server that exposes:
- Document CRUD
- Hybrid search (BM25 + vector + graph + rerank)
- CSV upload and natural-language query (NL → SQL)
- Neo4j graph sync
- MCP endpoints: /tools, /tools/call, /resources, /resources/read, /prompts, /prompts/get
csv_data_manager.py – Upload DataFrames/CSVs into MySQL tables.
csv_query_engine.py – Natural-language queries over CSV-backed tables.
csv_graph_mapper.py – Maps CSV tables into Neo4j nodes and relationships.
graph_handler.py – Core Neo4j operations with label sanitization.
entity_extractor.py – NER + LLM-based entity and relationship extraction.
bm25_handler.py, chroma_handler.py, reranker.py – Search components.
reset_database.py, reset_chromadb.py, test_database.py, test_neo4j.py, how_to_check.md – Utilities and docs.

3.3. `mcp_rag_pdf/` – RAG PDF MCP

server.py – FastAPI server for PDF upload, indexing, RAG search, and MCP endpoints.
models.py – Pydantic models for requests/responses.
Other files (e.g. PDF processing, chunking, hybrid RAG) implement the full PDF pipeline.
HYBRID_RAG_INTEGRATION.md – Documentation of PDF RAG integration.
test_rag.py – RAG PDF testing script.

3.4. Google MCP servers

mcp_google_drive/server.py – Google Drive MCP:
- Basic REST endpoints for file operations.
- MCP endpoints with Tools, Resources, Prompts for Drive files.
mcp_google_search/server.py – Google Search MCP:
- Wraps Google Custom Search API.
- MCP-compatible search tool plus resources/prompts.

3.5. `mcp_clients/`

Python client wrappers used by the agent:

database_mcp_client.py
rag_pdf_mcp_client.py
drive_mcp_client.py
search_mcp_client.py

Each client mirrors its server’s REST/MCP endpoints and hides HTTP details from the main agent.

4. MCP servers and protocol support

All four MCP servers follow the same pattern:

Traditional REST endpoints (for direct HTTP use and debugging).
MCP endpoints to be used by clients/agents:
- GET /tools
- POST /tools/call
- GET /resources
- GET /resources/read
- GET /prompts
- GET /prompts/get

4.1. Database MCP (port 8003)

Main responsibilities:

Document CRUD (/documents)
Document search (/documents/search)
CSV upload and NL query:
- /csv/upload – multipart CSV upload → MySQL table + graph sync
- /csv/query – CSVQueryRequest (query, table_name, limit)
Hybrid search pipeline:
- BM25 keyword
- Chroma vector
- Neo4j graph scores
- Cross-encoder reranking
Graph integration:
- Entity extraction from text
- Node + relationship creation in Neo4j
- CSV-to-graph mapping via csv_graph_mapper.sync_table_to_graph

MCP tools include (names may be slightly simplified here):

create_document, update_document, delete_document
search_documents
upload_csv, query_csv
execute_sql (safe wrapper)

The Database MCP client (database_mcp_client.py) exposes:

upload_csv(file_path, table_name)
query_csv(query, table_name, limit)
list_tables(), get_table_info(table_name), delete_table(table_name)
execute_sql(...)

4.2. RAG PDF MCP (port 8004)

Responsibilities:

Upload and store PDFs
Extract and chunk PDF text
Index chunks in Chroma + BM25
Provide RAG-style answers with:
- Relevant chunks
- Scores
- Optional summarization
(Optionally) integrate with Neo4j for entity/relationship graphing.

MCP side:

/tools advertises PDF-level tools (upload, search, summarize, etc.).
/tools/call executes those tools given a ToolCallRequest.
/resources and /resources/read allow fetching PDF-related resources.
/prompts and /prompts/get provide RAG and summarization prompt templates.

The RAG PDF client (rag_pdf_mcp_client.py) wraps these endpoints and returns Python dicts, including a fixed get_pdf() that unwraps the response correctly.

4.3. Google Drive MCP (port 8002)

Responsibilities:

List, upload, download, and delete files in Google Drive.
Abstract OAuth token flow (first run).

MCP tools (high level):

list_files
get_file_info
delete_file
(Optionally) download_file / upload_file modeled as tools and resources.

Resources map to Drive files (URIs typically encode file IDs). Prompts may hold standard “search my Drive” patterns.

4.4. Google Search MCP (port 8001)

Responsibilities:

Wrap the Google Custom Search JSON API.
Provide web search results as an MCP resource.

MCP tools:

search (main tool – returns web search results)

Resources and prompts:

Each search can be wrapped as a resource URI.
Prompts offer standard ways to phrase web queries or combine web and local data.

The search_mcp_client.py mirrors these tools for consumption by the main agent.

5. Data and search pipelines (end‑to‑end)

This section is for reviewers who want to understand “what happens when the user asks a question?”

5.1. User journey: high-level

User types a query in the Streamlit UI (ai_agent_system/app.py).
The request goes to the Agent (agent.py).
The Smart Router analyses the query:
- Keyword patterns (cheap, local)
- Confidence scores
- Optional OpenAI classification (for ambiguous queries)
The router decides which MCP(s) to call:
- Database MCP (notes, documents, CSV)
- RAG PDF MCP (PDF content)
- Google Drive MCP (Drive files)
- Google Search MCP (web search)
The agent calls the corresponding MCP client(s) in mcp_clients/.
Clients call the FastAPI MCP server endpoints.
Results are merged, reranked, and formatted into a response for the user.

For more detail on routing, see SMART_ROUTER_GUIDE.md.

5.2. Hybrid search pipeline (Database MCP)

When the Database MCP performs a hybrid search:

Input: Query + optional filters.
Steps:
1. Keyword search via BM25.
2. Semantic search via Chroma embeddings.
3. Graph search via Neo4j (if enabled):
  - Resolve entities mentioned in query.
  - Find related documents via graph relationships.
4. Combine scores from BM25, semantic, and graph.
5. Use cross-encoder reranker (from reranker.py) to produce final ranking.
Output: List of documents with:
- similarity_score
- bm25_score, semantic_score, graph_score (when available)

Testing examples for this pipeline can be found in TESTING_GUIDE.md and mcp_database/test_database.py.

5.3. PDF RAG pipeline (RAG PDF MCP)

Upload PDF → stored in DB or filesystem.
Extract text pages with a PDF library.
Chunk text into overlapping segments (chunk engine).
Store embeddings in Chroma + tokens for BM25.
At query time:
- Retrieve candidates by BM25 + vector.
- Optionally use graph/metadata.
- Rerank and return top chunks with scores and text snippets.
- Optionally call an LLM to synthesize an answer using those chunks.

See mcp_rag_pdf/HYBRID_RAG_INTEGRATION.md for deep details.

5.4. CSV → SQL → Answer (Database MCP)

When a CSV is uploaded via the app:

App writes your CSV DataFrame to a temp .csv file.
DatabaseMCPClient.upload_csv() sends it to /csv/upload.
The server uses CSVDataManager to:
- Validate the CSV.
- Create or replace a MySQL table.
After upload, the server calls csv_graph_mapper.sync_table_to_graph(...):
- Inspects columns (employees, managers, projects, departments).
- Creates Employee, Project, Department nodes.
- Creates relationships:
  - WORKS_ON (Employee → Project)
  - REPORTS_TO (Employee → Manager)
  - IN_DEPARTMENT (Employee → Department)
When you ask a natural-language question about that CSV:
- App calls db_client.query_csv(...).
- /csv/query uses CSVQueryEngine to:
  - Convert NL → SQL.
  - Execute on the MySQL table.
  - Return rows + optional SQL preview.

6. Knowledge graph and Neo4j

The graph features are optional but powerful. They live mainly in mcp_database/graph_handler.py and mcp_database/csv_graph_mapper.py.

6.1. Entity extraction

Uses spaCy for base entities (Person, Org, Location, etc.).
Uses OpenAI GPT-4o-mini to:
- Refine entity types.
- Extract relationships between entities (e.g. WORKS_ON, REPORTS_TO).

Entities and relationships are then persisted to Neo4j via GraphHandler.

6.2. GraphHandler safety

graph_handler.py:

Sanitizes all labels and relationship types (_sanitize_label()).
Escapes property keys correctly.
Uses backticked labels in Cypher to avoid syntax errors.

This ensures that unusual names in CSV/document data do not break Cypher queries.

6.3. CSV graph mapping

csv_graph_mapper.py:

_classify_columns(columns):
- Heuristically identifies:
  - Employee-related columns
  - Manager columns
  - Project columns
  - Department/team columns
sync_table_to_graph(mysql_conn, graph_handler, table_name, row_limit):
- Reads distinct values from MySQL.
- Creates person/project/department nodes.
- Iterates through rows to create:
  - WORKS_ON edges.
  - REPORTS_TO edges.
  - IN_DEPARTMENT edges.

Example Cypher you can run in Neo4j browser:

// Who works on which project?
MATCH (e:Employee)-[:WORKS_ON]->(p:Project)
RETURN e.name, p.name
LIMIT 25;

// Reporting relationships
MATCH (e:Employee)-[:REPORTS_TO]->(m:Employee)
RETURN e.name AS employee, m.name AS manager
LIMIT 25;

For a testing walkthrough, see mcp_database/test_neo4j.py and the Neo4j section in TESTING_GUIDE.md.

7. Smart Router (cost‑optimized agent brain)

The Smart Router is documented in detail in SMART_ROUTER_GUIDE.md. In short:

90% of queries are answered using keyword-based routing (free).
10% of queries use OpenAI when:
- Confidence is low.
- The query is ambiguous or complex.
Achieves very high accuracy with low monthly cost per user.

Configuration:

Controlled via environment variables in ai_agent_system/.env:
- USE_SMART_ROUTER=true
- SMART_ROUTER_OPENAI=true
Confidence threshold and more are adjustable in code (router.py / agent.py).

8. Testing and verification

For a step‑by‑step testing guide (user level + API level + advanced), see:

TESTING_GUIDE.md
mcp_database/test_database.py
mcp_database/test_neo4j.py
mcp_rag_pdf/test_rag.py

Typical quick checks:

http://localhost:8003/health – Database MCP healthy.
http://localhost:8004/health – RAG PDF MCP healthy.
http://localhost:8001/health – Google Search MCP healthy (if implemented).
http://localhost:8002/health – Google Drive MCP healthy.
http://localhost:7487 / http://localhost:7474 – Neo4j browser.

9. Running only parts of the system

Only notes + hybrid search:
- Start Database MCP and Streamlit app only.
Only PDF RAG:
- Start RAG PDF MCP and call it via mcp_rag_pdf/test_rag.py.
Only web search or Drive:
- Start the respective MCP server and hit /docs or /tools.

This is useful for demos where you do not need the full stack.

10. Contributing / extending

To add a new MCP server:
1. Create a new directory mcp_your_service/.
2. Implement a FastAPI server.py with:
  - Domain REST endpoints.
  - MCP endpoints: /tools, /tools/call, /resources, /resources/read, /prompts, /prompts/get.
3. Add a matching client in mcp_clients/your_service_mcp_client.py.
4. Integrate with ai_agent_system/agent.py and router.py.
To extend the graph schema:
- Edit mcp_database/csv_graph_mapper.py and graph_handler.py to introduce new node/edge types and classification rules.

Pull requests and code review notes can focus on:

MCP protocol consistency between servers.
Error handling and logging.
Neo4j schema evolution.
Search quality (BM25 + vector + reranker).

11. License and contact

License: Proprietary – internal project at Sepia ML.
Author: Shreyash Shankarrao Jadhav

For questions or review feedback, please contact the author or Sepia ML.

MCP Servers

MCP Orchestration System

1. What this project is

2. Quick start (beginner friendly)

2.1. Prerequisites

2.2. Clone the repo

2.3. Create virtual environments and install deps

2.4. Environment variables

2.5. Initialize databases (basic)

2.6. Start all services

2.7. First test (simple)

3. Project map (for orientation)

3.1. `ai_agent_system/` – main app

3.2. `mcp_database/` – Database MCP

3.3. `mcp_rag_pdf/` – RAG PDF MCP

3.4. Google MCP servers

3.5. `mcp_clients/`

4. MCP servers and protocol support

4.1. Database MCP (port 8003)

4.2. RAG PDF MCP (port 8004)

4.3. Google Drive MCP (port 8002)

4.4. Google Search MCP (port 8001)

5. Data and search pipelines (end‑to‑end)

5.1. User journey: high-level

5.2. Hybrid search pipeline (Database MCP)

5.3. PDF RAG pipeline (RAG PDF MCP)

5.4. CSV → SQL → Answer (Database MCP)

6. Knowledge graph and Neo4j

6.1. Entity extraction

6.2. GraphHandler safety

6.3. CSV graph mapping

7. Smart Router (cost‑optimized agent brain)

8. Testing and verification

9. Running only parts of the system

10. Contributing / extending

11. License and contact

Install Package (if required)

Cursor configuration (mcp.json)

MCP Orchestration System

1. What this project is

2. Quick start (beginner friendly)

2.1. Prerequisites

2.2. Clone the repo

2.3. Create virtual environments and install deps

2.4. Environment variables

2.5. Initialize databases (basic)

2.6. Start all services

2.7. First test (simple)

3. Project map (for orientation)

3.1. ai_agent_system/ – main app

3.2. mcp_database/ – Database MCP

3.3. mcp_rag_pdf/ – RAG PDF MCP

3.4. Google MCP servers

3.5. mcp_clients/

4. MCP servers and protocol support

4.1. Database MCP (port 8003)

4.2. RAG PDF MCP (port 8004)

4.3. Google Drive MCP (port 8002)

4.4. Google Search MCP (port 8001)

5. Data and search pipelines (end‑to‑end)

5.1. User journey: high-level

5.2. Hybrid search pipeline (Database MCP)

5.3. PDF RAG pipeline (RAG PDF MCP)

5.4. CSV → SQL → Answer (Database MCP)

6. Knowledge graph and Neo4j

6.1. Entity extraction

6.2. GraphHandler safety

6.3. CSV graph mapping

7. Smart Router (cost‑optimized agent brain)

8. Testing and verification

9. Running only parts of the system

10. Contributing / extending

11. License and contact

Install Package (if required)

Cursor configuration (mcp.json)

3.1. `ai_agent_system/` – main app

3.2. `mcp_database/` – Database MCP

3.3. `mcp_rag_pdf/` – RAG PDF MCP

3.5. `mcp_clients/`