MCP server by norvalbv
Research MCP Server
Intelligent multi-source research orchestration for AI assistants
Overview
Research MCP Server is a Model Context Protocol (MCP) server that provides consensus-driven, multi-source research capabilities to AI assistants like Claude, ChatGPT, and other MCP-compatible clients. It uses 3-5 LLMs to vote on research strategy, then dynamically orchestrates research across web search, academic papers, library documentation, and AI reasoning—delivering comprehensive, validated insights with built-in fact-checking.
Why Research MCP?
- Consensus Planning: 2-5 LLMs vote on research strategy + independent planning for each sub-question
- Production-Ready Reports: Enforced numeric specificity, no placeholder code, explicit success criteria
- Phased Synthesis: Token-efficient approach with key findings extraction (~40% fewer tokens)
- Code Validation: Post-synthesis validation against Context7 docs catches hallucinated code
- Inline Citations: Every claim sourced (
[perplexity:url],[context7:lib],[arxiv:id]) - Multi-Model Validation: Critical challenge + consensus validation by multiple LLMs
- Actionability Checklist: Synthesis evaluated for specificity, completeness, and contradiction-free output
- Context-Efficient Reports: Sectioned architecture with on-demand reading (prevents AI context bloat)
- Dynamic Execution: Custom research plans with parallel processing
- Multi-Source Synthesis: Combines Perplexity, arXiv, Context7, and direct LLM reasoning
Features
Core Capabilities
- Adaptive Research Planning: Root consensus + independent sub-question planning
- Multi-Source Search:
- Web search via Perplexity API
- Academic papers via arXiv with AI-generated summaries
- Library documentation via Context7 (with shared + specific doc fetching)
- Deep reasoning
- Parallel Processing: Main query + sub-questions execute simultaneously
- Phased Synthesis: Main synthesis → key findings extraction → sub-Q synthesis (token-efficient)
- Code Validation: Post-synthesis validation against Context7 documentation
- Validation Pipeline: Critical challenge + multi-model consensus + sufficiency voting
🚀 Installation
Prerequisites
- Node.js 18+
- API keys for:
- Perplexity API
- Google AI (Gemini)
- OpenAI API
- Context7 (for library documentation)
Quick Start
# Clone the repository
git clone https://github.com/yourusername/research-mcp.git
cd research-mcp
# Install dependencies
npm install
# or
bun install
# Build TypeScript
npm run build
Integration with MCP Clients
Claude Desktop / Cursor
Add to your MCP configuration file:
Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json
Cursor: ~/.cursor/mcp.json
{
"mcpServers": {
"research": {
"command": "node",
"args": ["/path/to/deep-research-mcp/dist/index.js"],
"env": {
"PERPLEXITY_API_KEY": "your-key",
"GEMINI_API_KEY": "your-key",
"OPENAI_API_KEY": "your-key",
"ARXIV_STORAGE_PATH": "/path/to/storage/",
"CONTEXT7_API_KEY": "your-key"
}
}
}
}
Restart your client after adding the configuration.
💡 Usage
Basic Example (Async Pattern)
Ask your AI assistant to use the research tool:
I need to research transformer architectures.
Can you use the start_research tool to give me a comprehensive overview?
The AI will call:
{
"query": "How do transformer architectures work?",
"depth_level": 2
}
Then poll with check_research_status using the returned job_id.
Reading Research Report Citations
When personas cite research using format [R-135216:5-19], you can verify the content:
Use the read_report tool to verify what the persona cited:
{
"citation": "[R-135216:5-19]"
}
This returns lines 5-19 from report R-135216.
Advanced Example with Rich Context
I'm building an AI memory companion that extracts entities and deduplicates memories.
I need to create 600+ test examples for evaluation, but my current template-based
approach creates unrealistic data. Research the best approaches for creating
high-quality evaluation datasets.
Context:
- Solo developer with 20 hour budget
- Already reviewed papers on Excel formula repair and DAHL biomedical benchmark
- Found that synthetic data is 40% simpler than real data
- Random template filling doesn't work
Specific questions:
1. What makes evaluation data representative?
2. How to generate hard negatives?
Tech stack: Python, Neo4j, LangSmith
This triggers a sophisticated research session with:
{
"query": "How to create high-quality evaluation datasets for LLM testing?",
"project_description": "AI memory companion with semantic extraction/dedup",
"current_state": "85 test examples, need 600+",
"problem_statement": "Template-based generation creates unrealistic data",
"constraints": ["Solo developer", "20 hours"],
"domain": "LLM evaluation datasets",
"depth_level": 4,
"papers_read": ["Excel formula repair", "DAHL biomedical benchmark"],
"key_findings": ["Synthetic data 40% simpler than real data"],
"rejected_approaches": ["Random template filling"],
"sub_questions": [
"What makes evaluation data representative?",
"How to generate hard negatives?"
],
"tech_stack": ["Python", "Neo4j", "LangSmith"],
"output_format": "actionable_steps",
"include_code_examples": true
}
Saving Research Reports
You can save research outputs as local markdown files by setting report: true:
{
"query": "How do transformer architectures work?",
"depth_level": 3,
"report": true,
"report_path": "/Users/name/Documents/research/" // optional
}
Default behavior:
- Reports saved to
~/research-reports/ - Filename format:
research-YYYY-MM-DD-sanitized-query.md - Example:
research-2025-12-10-how-do-transformer-architectures-work.md - File path included in response
Custom directory: Use report_path parameter to specify a different location.
Available Tools
The MCP server exposes five tools:
start_research- Async research orchestrator with rich parameters (returns job_id immediately)check_research_status- Poll async job status and retrieve results when completeread_report- Read specific lines from research reports using citation format (e.g.,[R-135216:5-19])read_paper- Passthrough to arXiv MCP for reading full papersdownload_paper- Passthrough to arXiv MCP for downloading PDFs
Parameters Reference
| Parameter | Type | Description |
|-----------|------|-------------|
| query | string | Required. Your research question |
| depth_level | 1-5 | Research depth (auto-detected if omitted) |
| project_description | string | What you're building |
| current_state | string | Where you are now |
| problem_statement | string | The specific problem to solve |
| constraints | string[] | Time/budget/technical limits |
| domain | string | Research domain/area |
| papers_read | string[] | Papers already reviewed (prevents redundancy) |
| key_findings | string[] | What you already know |
| rejected_approaches | string[] | Approaches already ruled out |
| sub_questions | string[] | Specific questions to answer in parallel |
| tech_stack | string[] | Technologies in use (triggers Context7 docs) |
| output_format | enum | summary, detailed, or actionable_steps |
| include_code_examples | boolean | Whether to fetch code examples |
| date_range | string | Preferred date range (e.g., "2024-2025") |
🔍 How It Works
Intelligent Research Architecture (v2)
The research system uses a sophisticated phased approach designed for token efficiency and code accuracy:
Phase 1: Root Planning
- Consensus voting by 2-5 LLMs determines research complexity (1-5)
- Root planner creates strategy for main query only
- Identifies shared documentation needs (base API/syntax docs)
- Each sub-question gets independent planning (lightweight, fast)
Phase 2: Parallel Data Gathering
- Shared Context7 docs fetched once for all queries (e.g., "React basics")
- Main query executed with full tool access
- Sub-questions planned independently via fast LLM calls
- Each sub-Q chooses its own tools (context7, perplexity, arxiv)
- Can request specific Context7 topics beyond shared docs
- All gathering happens in parallel for speed
Phase 3: Phased Synthesis (Token-Efficient)
- Main query synthesis - comprehensive answer to primary question
- Key findings extraction - ~500 token summary of main conclusions
- Sub-question synthesis - parallel, with key findings injected for coherence
- Prevents contradictions between main and sub-answers
- Each sub-Q synthesis uses only relevant data (not all research)
Phase 4: Code Validation Pass
- Extracts all code blocks from synthesized report
- Validates against authoritative Context7 documentation
- Fixes hallucinated APIs, outdated syntax, incorrect method names
- Context7 becomes source of truth for code accuracy
Phase 5: Multi-Model Validation
- Critical Challenge: LLM attacks synthesis to find gaps
- Consensus (depth ≥4): 3 LLMs validate findings
- Sufficiency Vote: Synthesis vs. critique
- Re-synthesis if significant gaps found
Why This Architecture?
Token Efficiency:
- Phased synthesis uses ~40% fewer tokens vs. monolithic approach
- Sub-questions don't see full main query data dump
- Key findings summary prevents redundant context
Code Accuracy:
- Context7 validation catches hallucinated code before delivery
- Inline citations trace every claim to source
- Docs fetched once and cached for validation pass
Research Quality:
- Independent sub-Q planning prevents bias from root plan
- Each sub-Q gets optimal tool selection
- Key findings injection ensures coherent, non-contradictory answers
- Synthesis LLMs use temperature=0.2 for deterministic, specific outputs
- Production engineer persona prompt for deployable solutions
- Explicit numeric specificity mandates (no "high", "fast", "good")
- Few-shot examples enforce production-ready code (no TODO/FIXME)
- Checklist-based validation audits actionability before delivery
Research Flow Diagram
graph TD
A[Query + Context] --> B[Root Planning: 2-5 LLM Consensus]
B --> C[Main Query Strategy]
B --> D[Identify Shared Docs]
D --> E[Fetch Base Context7 Docs]
A --> F[Plan Each Sub-Q Independently]
C --> G[Execute Main Query]
E --> G
F --> H[Execute Sub-Qs in Parallel]
E --> H
G --> I[Phase 1: Main Synthesis]
I --> J[Extract Key Findings ~500 tokens]
J --> K[Phase 2: Sub-Q Syntheses Parallel]
H --> K
K --> L[Code Validation vs Context7]
L --> M[Critical Challenge]
M --> N{Depth ≥4?}
N -->|Yes| O[Multi-LLM Consensus]
N -->|No| P[Sufficiency Vote]
O --> P
P --> Q{Sufficient?}
Q -->|Yes| R[Report with Inline Citations]
Q -->|No| S[Re-synthesize]
S --> I
Inline Citations
Reports now include inline source citations for traceability:
[perplexity:url]- Web search finding[context7:library-name]- Library documentation/code[arxiv:paper-id]- Academic paper[deep_analysis]- LLM reasoning
Example:
LangSmith provides dataset management [context7:langsmith] which supports
version control [perplexity:langsmith-docs] as validated in recent research
[arxiv:2024.12345].
Context-Efficient Report Structure
Reports use sectioned architecture for AI consumption:
- Executive Summary - Overview + section index with IDs and line ranges
- On-demand Section Reading - AI can load specific sections only
- Quick Reference - Citation examples (
R-ID:section,R-ID:section:20-50)
Example usage:
read_report(citation="R-182602:q1") # Read sub-question 1
read_report(citation="R-182602:q1:20-50") # Lines 20-50 of sub-Q 1
read_report(citation="R-182602", full=true) # Full report (last resort)
This prevents context bloat - AI assistants load only what they need.
Common Issues
Perplexity API Errors
- 401 Unauthorized: Check that
PERPLEXITY_API_KEYis set correctly - 429 Rate Limited: You've exceeded API quota. Check Perplexity dashboard
- Connection timeout: Verify network connectivity
Context7 or arXiv Connection Issues
These are spawned as subprocesses. Check:
# Verify Context7 MCP is accessible (if installed separately)
# Verify arXiv MCP server is installed:
uv tool run arxiv-mcp-server --help
MCP Client Not Detecting Server
- Verify the path in your MCP config is correct (absolute path)
- Restart your MCP client (Claude Desktop, Cursor, etc.)
- Check client logs for connection errors:
- Claude Desktop:
~/Library/Logs/Claude/ - Cursor: Developer Tools → Console
- Claude Desktop:
Environment Variables Not Loading
If you see "Not connected" errors despite having API keys in your MCP config, try these solutions:
Option 1: Use built JavaScript file (Recommended)
{
"mcpServers": {
"research": {
"command": "node",
"args": ["/absolute/path/to/research-mcp/dist/index.js"],
"env": {
"PERPLEXITY_API_KEY": "your-key",
"GEMINI_API_KEY": "your-key",
"OPENAI_API_KEY": "your-key"
}
}
}
}
Option 2: Fix path with spaces
If your path contains spaces (e.g., /Users/name/Desktop/Personal and learning/...):
{
"command": "npx",
"args": [
"tsx",
"/Users/name/Desktop/Personal and learning/quick-mcp/research/src/index.ts"
]
}
Note: Paths with spaces are properly handled in JSON arrays. The issue is usually using source files instead of built files.
Example Output Structure
# Research Results: [Your Query]
## Complexity Assessment
**Level**: 4/5
**Reasoning**: Complex research requiring academic papers and library documentation
## Research Action Plan
**Estimated Time**: ~45s
**Steps Executed**:
1. **perplexity**: Search for recent approaches and best practices _(parallel)_
2. **deep_analysis**: Analyze web findings for technical insights
3. **context7**: Fetch React and TypeScript documentation _(parallel, shared + specific)_
4. **arxiv**: Search academic papers on evaluation datasets
5. **consensus**: Validate findings across multiple models
## Synthesis with Inline Citations
### Overview
LangSmith provides comprehensive dataset management [context7:langsmith] which enables
evaluation workflow automation [perplexity:langsmith-docs]. Recent research shows that
synthetic data generation requires careful attention to distribution matching [arxiv:2024.12345].
```typescript
// Code validated against Context7
import { Dataset } from "langsmith";
const dataset = new Dataset("my-eval-set");
Sub-Question 1: What makes evaluation data representative?
Representative data must match real-world distributions [deep_analysis] and include edge cases from production logs [context7:langsmith]. Studies indicate that 600+ examples provide sufficient statistical power for small effect detection [arxiv:2024.67890].
Code Validation Summary
✅ 3 code blocks validated ✅ 1 syntax correction applied (outdated API method)
Multi-Model Consensus
[3 LLMs validated findings—shows agreement/disagreement]
Critical Challenge
[Critical validation—alternative perspectives and gaps identified]
Quality Validation
Vote Result: 2 sufficient, 1 insufficient Status: ✅ Response is sufficient
Model Feedback:
- ✅ gemini-2.5-flash: Response comprehensively addresses the query with actionable steps
- ✅ gpt-5-mini-2025-08-07: Good coverage of edge cases and validation methods
- ❌ claude-3.5-haiku: Could benefit from more code examples
Report ID: R-182602
Usage Examples:
read_report(citation="R-182602:overview") # Read overview section
read_report(citation="R-182602:q1") # Read sub-question 1
read_report(citation="R-182602:q1:20-50") # Lines 20-50 of sub-Q 1
read_report(citation="R-182602", full=true) # Full report (last resort)
## This MCP is built on top of other MCP servers and tools
- [Perplexity AI](https://www.perplexity.ai/) - Web search capabilities
- [arXiv](https://arxiv.org/) - Academic paper repository
- [Context7](https://context7.com/) - Library documentation search
- All contributors and users of this project