Simple mcp server/client to process with documents. currently support English, Japanese and Bangla.
Simple Document MCP Server
A minimal MCP (Model Context Protocol) server for document processing and search with multi-language support.
🎯 Features
- Multi-format Support: PDF, DOCX, XLSX, and TXT files
- Multi-language Support: English, Japanese, Bangla (Bengali), and more
- Full-text Search: Search across all indexed documents with context
- Document Metadata: Extract file type, language, size, and modification date
- Interactive Client: Easy-to-use command-line interface
- Flexible Directory: Custom documents directory via command-line arguments
- Configurable Logging: Adjustable log levels (DEBUG, INFO, WARNING, ERROR)
- Error Handling: Robust error handling and logging
- Auto-create Directories: Automatically creates missing document directories
🚀 Quick Start
Option 1: Automated Setup
# Run the setup script
./setup.sh
# Activate the virtual environment
source venv/bin/activate
Option 2: Manual Setup
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Create documents directory
mkdir -p documents
📖 Usage
Starting the Server
# Terminal 1: Start the MCP server
python simple_mcp_server.py # Use default ./documents directory
python simple_mcp_server.py --dir /path/docs # Use custom directory
python simple_mcp_server.py -d ~/Documents # Use home Documents folder
python simple_mcp_server.py --log-level DEBUG # Enable debug logging
Using the Client
# Terminal 2: Start the interactive client
python simple_client.py # Use default settings
python simple_client.py --dir /path/docs # Use custom documents directory
python simple_client.py demo # Run demo mode
python simple_client.py demo --dir ~/docs # Run demo with custom directory
python simple_client.py --log-level DEBUG # Enable debug logging
Available Commands
| Command | Description | Example |
|---------|-------------|---------|
| scan
| Scan and index all documents | scan
|
| search <query>
| Search for text in documents | search machine learning
|
| list
| List all processed documents | list
|
| stats
| Show document collection statistics | stats
|
| content <filename>
| Get full content of a document | content sample.txt
|
| tools
| Show available MCP tools | tools
|
| quit
| Exit the client | quit
|
📁 Directory Structure
docmcp/
├── simple_mcp_server.py # Main MCP server
├── simple_client.py # Interactive client
├── requirements.txt # Python dependencies
├── setup.sh # Automated setup script
├── README.md # This file
└── documents/ # Document storage
├── english/ # English documents
├── japanese/ # Japanese documents
└── bangla/ # Bangla documents
🔧 MCP Tools
The server provides 5 MCP tools:
- scan_documents: Index all documents in the documents directory
- search_documents: Search for text with configurable result limits
- list_documents: List all processed documents with metadata
- get_document_stats: Get collection statistics (size, languages, types)
- get_document_content: Retrieve full content of a specific document
🌍 Language Support
The server automatically detects document language using langdetect
. Supported languages include:
- English (en)
- Japanese (ja)
- Bangla/Bengali (bn)
- And many more (any language supported by langdetect)
📄 Supported File Types
| Extension | Type | Library Used |
|-----------|------|--------------|
| .pdf
| PDF Documents | PyPDF2 |
| .docx
| Word Documents | python-docx |
| .xlsx
| Excel Spreadsheets | openpyxl |
| .txt
| Text Files | Built-in (multi-encoding) |
🔍 Search Features
- Full-text search across all document content
- Context highlighting around matches
- Multiple matches per document with position tracking
- Result limiting to prevent overwhelming output
- Case-insensitive search
🛠️ Development
Adding New File Types
To add support for new file types, extend the SimpleDocumentProcessor
class:
def extract_text_from_newtype(self, file_path: Path) -> str:
# Your extraction logic here
pass
# Add to the extractors dictionary in process_document()
extractors = {
'.newext': (self.extract_text_from_newtype, "New Type"),
# ... existing extractors
}
Customizing Search
The search functionality can be enhanced by modifying the search_documents
method:
def search_documents(self, query: str, max_results: int = 50) -> List[Dict[str, Any]]:
# Add regex support, fuzzy matching, etc.
pass
🔒 Error Handling
The server includes comprehensive error handling:
- File reading errors: Gracefully handles corrupted or unreadable files
- Encoding issues: Tries multiple encodings for text files
- Missing dependencies: Clear error messages for missing libraries
- Server errors: JSON error responses for client handling
📊 Example Output
Server Help
$ python simple_mcp_server.py --help
usage: simple_mcp_server.py [-h] [--dir DIR] [--log-level {DEBUG,INFO,WARNING,ERROR}] [--version]
Simple Document MCP Server
options:
-h, --help show this help message and exit
--dir DIR, -d DIR Directory containing documents to process (default: ./documents)
--log-level {DEBUG,INFO,WARNING,ERROR}
Set logging level (default: INFO)
--version show program's version number and exit
Client Help
$ python simple_client.py --help
usage: simple_client.py [-h] [--dir DIR] [--log-level {DEBUG,INFO,WARNING,ERROR}] [{interactive,demo}]
Simple Document MCP Client
positional arguments:
{interactive,demo} Run in interactive mode or demo mode (default: interactive)
options:
-h, --help show this help message and exit
--dir DIR, -d DIR Directory containing documents to process (uses server default if not specified)
--log-level {DEBUG,INFO,WARNING,ERROR}
Set server logging level (default: INFO)
Document Scan with Custom Directory
🎬 Running Demo Commands...
📁 Documents directory: /custom/path/docs
✅ Scanned and processed 3 documents
📚 Documents (3):
1. sample.txt (Text File, en)
Path: /custom/path/docs/sample.txt
Size: 1.2 KB
Preview: This is a sample English document for testing...
2. sample.txt (Text File, ja)
Path: /custom/path/docs/sample.txt
Size: 0.8 KB
Preview: これは日本語のサンプル文書です...
Search Results
🎯 Search Results for 'processing' (2 matches):
1. sample.txt (Text File, en)
Path: documents/english/sample.txt
Position: 156
Context: ...document **processing** system supports...
2. readme.txt (Text File, en)
Path: documents/english/readme.txt
Position: 89
Context: ...server can **processing**: - PDF files...
👨💻 Author
Shaiful Islam Shabuj
- GitHub: @shaifulshabuj
- Repository: simple-document-mcp-server
🤝 Contributing
- Fork the repository
- Create a feature branch
- Add your improvements
- Test with the provided client
- Submit a pull request
📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
Copyright (c) 2024 Shaiful Islam Shabuj
🆘 Troubleshooting
Common Issues
"Missing required dependency"
pip install -r requirements.txt
"Server not found"
- Make sure you're in the correct directory
- Check that
simple_mcp_server.py
exists - Verify Python path in the client
"No documents found"
- Check that documents exist in the
documents/
directory - Run the
scan
command first - Verify file permissions
"Language detection failed"
- Document might be too short for reliable detection
- Try with longer text content
- Check for non-text content in files
Getting Help
- Check the server logs for detailed error messages
- Run the demo client:
python simple_client.py demo
- Verify your setup with the sample documents
- Check file permissions and encoding issues