MCP server by dchatpar
Local File Search MCP Agent
LangChain CLI agent that combines an in-process Local File Search MCP (FastMCP) with the remote Microsoft Learn MCP. The LLM uses MiniMax via the OpenAI-compatible API.
Features
search_files— metadata filters (name, folder, extension, dates, size)search_pdf_content— PDF full-text keyword search viapypdf- Microsoft Learn MCP at
https://learn.microsoft.com/api/mcp(streamable_http) - SKILL-based routing with JSON-only local results and 2000-char MS answers
- Async REPL CLI
Documentation
| Guide | Description |
|-------|-------------|
| docs/README.md | Documentation index |
| docs/PROJECT_OVERVIEW.md | Architecture and what was built |
| docs/LLM_PROVIDER_GUIDE.md | MiniMax ↔ OpenAI migration |
| docs/DEPLOYMENT.md | Deploy: local, GitHub, Docker, systemd |
| docs/OPERATIONS.md | Operations and CI |
| docs/TROUBLESHOOTING.md | Common issues |
| docs/COMPLIANCE_REPORT.md | Assignment audit |
| docs/INSTRUCTIONS_FOR_ABIN.md | Reviewer guide for install.py |
Setup
One command (recommended):
git clone https://github.com/dchatpar/mcp-file-agent.git
cd mcp-file-agent
chmod +x install.py
./install.py --non-interactive --skip-e2e # no API key; full gate without E2E
# Or interactive: ./install.py
Manual setup:
cd mcp-file-agent
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
# Set OPENAI_API_KEY in .env (never commit .env)
python scripts/generate_samples.py
MiniMax (OpenAI-compatible)
Configure .env:
OPENAI_API_KEY=<your MiniMax API key>
OPENAI_BASE_URL=https://api.minimax.io/v1
OPENAI_MODEL=MiniMax-M2.7
LangChain uses ChatOpenAI with base_url pointing at MiniMax. MiniMax-only extra_body (thinking disabled) is applied automatically when the base URL contains minimax. OPENAI_API_BASE_URL is accepted as an alias for OPENAI_BASE_URL.
Assignment / OpenAI GPT-5.x: Defaults to MiniMax-M2.7. For OpenAI, copy .env.openai.example to .env or set OPENAI_BASE_URL=https://api.openai.com/v1, your GPT model id, and an OpenAI API key. Full steps: docs/LLM_PROVIDER_GUIDE.md.
Environment
| Variable | Default | Description |
|----------|---------|-------------|
| OPENAI_API_KEY | — | Required for agent E2E (MiniMax key) |
| OPENAI_BASE_URL | https://api.minimax.io/v1 | MiniMax OpenAI-compatible endpoint |
| OPENAI_MODEL | MiniMax-M2.7 | Model name on MiniMax |
| SEARCH_ROOT | data/samples/zoology | Sandboxed search directory |
| FILE_SEARCH_ROOT | (same as SEARCH_ROOT) | Alias for SEARCH_ROOT |
| MICROSOFT_LEARN_MCP_URL | https://learn.microsoft.com/api/mcp | Learn MCP endpoint |
| MS_ANSWER_MAX_CHARS | 2000 | Max length for Microsoft Learn answers |
Run CLI
file-search-agent
# or
python -m file_search_agent.main
Sample queries
- Local (JSON only):
What PDF files are available in our system? - Learn (≤2000 chars):
What is Azure Blob Storage? - PDF content search:
Find mentions of migration in the PDFs - Out-of-scope:
What is the capital of France?→ refusal JSON
Test data
The data/samples/zoology/ directory holds 8 non-technical zoology files used for all local-search tests:
| File | Extension | Description |
|------|-----------|-------------|
| african_elephant_study.pdf | .pdf | Elephant population dynamics |
| marine_mammals_report.pdf | .pdf | Orca/dolphin hydrophone survey |
| bird_migration_analysis.pdf | .pdf | Arctic tern geolocator study |
| amphibian_survey_2023.pdf | .pdf | Chytrid fungus impact assessment |
| coral_reef_observations.docx | .docx | Great Barrier Reef transect notes |
| species_count_2024.xls | .xls | Endangered species population counts |
| field_notes_borneo.txt | .txt | Borneo rainforest expedition diary |
| jaguar_photo_rainforest.jpg | .jpg | Camera-trap image placeholder |
Regenerate with: python scripts/generate_samples.py
Verification
QA matrix
| Check | Command | API key | Expected |
|-------|---------|---------|----------|
| Lint | ruff check src tests scripts install.py | No | All checks passed |
| Unit tests | pytest -v | No | 40 passed |
| E2E agent | python -u scripts/e2e_verify.py | Yes | 5/5 PASSED (~1–2 min) |
| Production gate | python -u scripts/production_gate.py | Yes | All 6 steps PASS (~90s) |
| Sample data | python scripts/generate_samples.py | No | 8 files in data/samples/zoology/ |
Run lint and unit tests in parallel:
source .venv/bin/activate
pip install -e ".[dev]"
python scripts/generate_samples.py
ruff check src tests scripts & pytest -v & wait
E2E (requires OPENAI_API_KEY in .env):
Takes about 1–2 minutes. Use unbuffered output so progress prints appear immediately ([1/5] … [5/5]):
python -u scripts/e2e_verify.py
Checks:
- PDF files query → local tools, JSON with PDF entries
- List all files → local tools, 8 files total
- Elephant search → local tools, elephant match in JSON
- Azure Blob Storage → Learn MCP only, answer ≤ 2000 chars
- Out-of-scope (capital of France) → assignment error JSON, no tools
Interactive CLI smoke test:
file-search-agent
Assignment compliance
| Requirement | Implementation | Verified by |
|-------------|----------------|-------------|
| Local File Search MCP (in-process) | mcp/local_file_search.py via FastMCP | test_local_mcp.py, E2E [1–3] |
| search_files metadata filters | name, folder, extension, dates, size | test_search_files_* |
| search_pdf_content full-text | pypdf keyword search | test_search_pdf_content_keyword |
| list_all_files | lists all sandboxed files | test_list_all_files_returns_eight, E2E [2] |
| read_pdf_content | read single PDF by path | test_read_pdf_content_* |
| Microsoft Learn MCP (remote) | streamable_http at learn.microsoft.com | test_learn_mcp.py, E2E [4] |
| SKILL routing (local JSON / MS prose / out-of-scope) | SKILL.md, routing.py, output_guard.py | test_agent_routing.py, E2E [5] |
| MiniMax via OpenAI-compatible API | ChatOpenAI + conditional extra_body | agent_factory.py, test_agent_factory.py, E2E all |
| Sandboxed SEARCH_ROOT | path traversal rejected | test_security.py |
| 8 sample zoology files | data/samples/zoology/ | generate_samples.py, E2E [2] |
Dependencies
Pinned full environment (after pip install -e ".[dev]"):
pip install -r requirements.txt
pip install -e .
Or install from project metadata only: pip install -e ".[dev]".
GitHub
Published repository: https://github.com/dchatpar/mcp-file-agent
Reviewer abin-aot has been invited as a collaborator. Submission email draft for the AOT assessment: docs/SUBMISSION_EMAIL_TO_ABIN.md.
Project layout
src/file_search_agent/
main.py # Async REPL
config.py # Env config
models.py # Pydantic tool models
agent_factory.py # create_agent + MCP clients
output_guard.py # JSON / truncation guards
mcp/local_file_search.py
data/samples/zoology/ # Non-tech zoology sample files
docs/ # Full deployment and LLM guides
deploy/ # systemd unit example
Dockerfile # Container image
docker-compose.yml
tests/
License
MIT