MCP server by melisasvr
🔍 Deep Research Agent MCP Server
An AI-powered deep research agent built with Python, FastMCP, and Streamlit.
Search → Fetch → Cluster → Report. Fully automated. Fully open source.
📖 Overview
Deep Research Agent is a Python MCP server that automates the full research pipeline from web search to a structured executive report in under 10 seconds. It uses a 4-tool pipeline, powered by the Tavily Search API for retrieval and a pure-Python TF-IDF + K-Means engine for semantic clustering. The frontend is a sleek Streamlit chat interface that shows every step live.
💡 Built as a portfolio project demonstrating: FastMCP tool design, async web scraping, NLP clustering without heavy ML dependencies, and full-stack Python app architecture.
✨ Features
- 🔎 Multi-angle web search: 3 query variations per topic for broader coverage
- 🌐 Async URL fetching: parallel page retrieval with smart HTML cleaning
- 🧹 Text denoising: strips HTML entities, SVG labels, nav boilerplate, repeated patterns
- 🧠 Semantic clustering: pure-Python TF-IDF + K-Means, zero ML framework required
- 🏷️ Auto cluster labeling: 13 topic categories (Quantum, Biotech, Climate, AI, Policy...)
- 📄 Structured reports: markdown or JSON output with sources, keywords, evidence
- ⬇️ One-click download: export
.mdreport directly from the UI - ⚡ Fast: full pipeline typically completes in 6–12 seconds
🏗️ Architecture
┌─────────────────────────────────────────────────┐
│ Streamlit Frontend │
│ app.py — Chat UI │
│ Live step cards · Download button │
└───────────────────┬─────────────────────────────┘
│ FastMCP Client (protocol-aware)
┌───────────────────▼─────────────────────────────┐
│ FastMCP Server (server.py) │
│ │
│ 🔎 search_web 🌐 fetch_and_chunk │
│ Tavily API httpx + HTML parser │
│ │
│ 🧠 cluster_findings 📄 generate_report │
│ TF-IDF + K-Means Markdown / JSON │
└─────────────────────────────────────────────────┘
🛠️ Tech Stack
| Layer | Technology | |-------|-----------| | 🐍 Language | Python 3.12+ | | 🔌 MCP Framework | FastMCP 3.x | | 🖥️ Frontend | Streamlit | | 🔎 Search | Tavily Search API | | 🌐 HTTP Client | httpx (async) | | 🧠 Clustering | Pure-Python TF-IDF + K-Means | | ⚙️ Config | python-dotenv |
🚀 Quick Start
1️⃣ Clone the repository
git clone https://github.com/yourusername/deep-research-agent.git
cd deep-research-agent
2️⃣ Install dependencies
pip install -r requirements.txt
3️⃣ Configure environment
cp .env.example .env
Edit .env and add your Tavily API key:
TAVILY_API_KEY=tvly-your-key-here
MCP_HOST=localhost
MCP_PORT=8000
TAVILY_SEARCH_DEPTH=advanced
🔑 Get a free Tavily API key at app.tavily.com
4️⃣ Start the MCP server
# Terminal 1
python server.py
You should see:
🔍 Deep Research Agent MCP Server
Host : localhost
Port : 8000
Tools: search_web, fetch_and_chunk, cluster_findings, generate_report
Tavily key: ✓ set
5️⃣ Launch the Streamlit UI
# Terminal 2
streamlit run app.py
Open http://localhost:8501 and start researching! 🎉
🔧 MCP Tools Reference
🔎 search_web(query, max_results)
- Searches the web via Tavily API and returns ranked results with scores.
search_web(
query="AI energy crisis 2026",
max_results=8, # 1–15
include_domains=None, # e.g. [".edu", ".gov"]
exclude_domains=None
)
🌐 fetch_and_chunk(urls, chunk_size)
- Fetches pages asynchronously and splits content into overlapping text chunks.
fetch_and_chunk(
urls=["https://example.com/article"],
chunk_size=400, # words per chunk
chunk_overlap=50, # overlap between chunks
max_chunks_per_url=6
)
🧠 cluster_findings(chunks, n_clusters)
- Groups chunks into semantic themes using TF-IDF vectorization + K-Means clustering.
cluster_findings(
chunks=[...], # from fetch_and_chunk
n_clusters=4, # 2–6 themes
top_terms_per_cluster=8
)
📄 generate_report(topic, clusters)
- Synthesizes all clusters into a structured research report.
generate_report(
topic="AI energy crisis 2026",
clusters=[...], # from cluster_findings
format="markdown", # or "json"
include_sources=True
)
🔄 Research Pipeline
📝 User enters topic
│
▼
🔎 search_web() × 3 queries ──────────► 24 ranked sources
│
▼
🌐 fetch_and_chunk() on top 5 URLs ───► 20–30 text chunks
│
▼
🧠 cluster_findings() ────────────────► 4 semantic themes
│
▼
📄 generate_report() ─────────────────► Structured .md report
│
▼
⬇️ Download / Display in UI
📁 Project Structure
deep-research-agent/
│
├── 📄 server.py # FastMCP server — all 4 tools
├── 🖥️ app.py # Streamlit frontend
├── 📋 requirements.txt # Python dependencies
├── 🔒 .env.example # Environment variable template
└── 📖 README.md # This file
⚙️ Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| TAVILY_API_KEY | (required) | Your Tavily search API key |
| MCP_HOST | localhost | MCP server host |
| MCP_PORT | 8000 | MCP server port |
| TAVILY_SEARCH_DEPTH | advanced | basic (faster) or advanced (thorough) |
🤝 Contributing
- Contributions are welcome and appreciated! Here's how to get involved:
🐛 Reporting Bugs
- Check the Issues page to see if it's already reported
- Open a new issue with:
- A clear title and description
- Steps to reproduce
- Expected vs actual behaviour
- Your Python version and OS
💡 Suggesting Features
Open an issue with the enhancement label and describe:
- The problem you're trying to solve
- Your proposed solution
- Why would it benefit other users
🔧 Submitting Pull Requests
- Fork the repository
- Create a feature branch
git checkout -b feature/your-feature-name - Make your changes with clear, descriptive commits
git commit -m "feat: add BM25 ranking to cluster_findings" - Test your changes thoroughly
- Push to your fork
git push origin feature/your-feature-name - Open a Pull Request with a clear description of what you changed and why
📐 Code Style
- Follow PEP 8 for Python code
- Use type hints wherever possible
- Add docstrings to all new functions
- Keep functions focused — one responsibility per function
🌱 Good First Issues
- Looking for a place to start? Check issues tagged
good first issue: - Adding more cluster label categories to
_infer_cluster_label() - Improving the HTML cleaning regex patterns
- Adding a progress bar to the Streamlit UI
- Supporting additional output formats (PDF, DOCX)
- Writing unit tests for the clustering functions
📜 License
MIT License
Copyright (c) 2026 Deep Research Agent Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including, without limitation, the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
🙏 Acknowledgements
- FastMCP — Python MCP server framework
- Tavily — AI-optimised search API
- Streamlit — Python web app framework
- httpx — Async HTTP client
- Made with ❤️ and Python · Star ⭐ this repo if you found it useful!