Simple RAG Server for pet-projects
Python RAG Server with DuckDB
This project is a Python-based server designed for document processing and retrieval-augmented generation (RAG). It provides a simple web interface and a JSON API to upload documents, process them into chunks, generate embeddings, and store them in a DuckDB database for efficient similarity search.
The entire application is containerized with Docker and uses uv
for fast, optimized dependency management. It also includes an mcp-rag-service
for integration with MCP (Machine Comprehension Platform).
Features
- Web Interface: Minimalist UI for uploading files, initiating processing, and performing searches.
- JSON API: Provides
/api/search
,/api/stats
, and/health
endpoints for programmatic integration. - Wide File Support: Handles various file types including
.txt
,.md
,.pdf
, and multiple programming language source files (.py
,.js
,.java
, etc.). - Advanced Chunking: Uses different strategies based on file type (e.g.,
CodeSplitter
for source code,RecursiveCharacterTextSplitter
for text). - High-Quality Embeddings: Uses
sentence-transformers/paraphrase-multilingual-mpnet-base-v2
(primary, 768d) orsentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
(fallback, 384d). - Vector Database: Leverages DuckDB with the VSS (Vector Similarity Search) extension for efficient storage and querying of embeddings.
- Dockerized & Optimized:
- Easy to build and run with Docker.
- Uses
uv
for ultra-fast dependency installation. - Multi-stage Dockerfile for small final image size.
- Supports CPU-only builds for environments without a GPU.
- MCP Integration: Includes a sample
mcp-rag-service
to demonstrate integration with external systems. - Directory Upload: Support for uploading entire directories with file extension filtering.
- Health Monitoring: Built-in health check endpoint for monitoring and load balancers.
Tech Stack
- Backend: Python with FastAPI
- Embeddings:
sentence-transformers
,llama-index
,langchain
- Database: DuckDB + VSS extension
- Containerization: Docker
- Package Management:
uv
How to Run
Prerequisites
- Docker installed and running on your machine.
Build and Run the Docker Container
-
Clone the repository:
git clone <repository-url> cd <repository-name>
-
Build the Docker image: The build process is optimized using a multi-stage Dockerfile and
uv
. You can choose between a standard build (which includes GPU-capable libraries) and a CPU-only build.Standard Build (for environments with GPU support):
docker build -t rag-duckdb-server .
CPU-Only Build (recommended for local development or CPU servers): This build is faster and results in a smaller image by using a CPU-only version of PyTorch.
docker build --build-arg USE_CPU_ONLY=true -t rag-duckdb-server-cpu .
-
Run the Docker container: This command starts the server and maps the local
uploads
anddata
directories to the container. This ensures your uploaded files and the database persist even if the container is removed.For standard build:
docker run -p 8000:8000 \ -v "$(pwd)/uploads:/app/uploads" \ -v "$(pwd)/data:/app/data" \ --name rag-server \ rag-duckdb-server
For CPU-only build:
docker run -p 8000:8000 \ -v "$(pwd)/uploads:/app/uploads" \ -v "$(pwd)/data:/app/data" \ --name rag-server-cpu \ rag-duckdb-server-cpu
Note for Windows users: Use
${pwd}
instead of$(pwd)
in PowerShell. -
Access the application: Open your web browser and navigate to
http://localhost:8000
.
Usage Workflow
- Upload Files: Use the web interface to select and upload one or more supported files.
- Upload Directory: Alternatively, upload entire directories with file extension filtering to process only specific file types.
- Process Files: Click the "Start Processing" button. The server will:
- Extract text content.
- Split the text into manageable, context-aware chunks.
- Generate a vector embedding for each chunk.
- Save the chunks and their embeddings to the
data/rag.duckdb
database. - Delete processed files from the
uploads
folder.
- Search Documents: Once documents are processed, use the semantic search bar to find relevant content across all indexed chunks.
- Use API: Interact with the server programmatically via the
/api/*
endpoints.
Supported File Types
The server supports a wide range of file types:
Text Documents
.txt
- Plain text files.md
- Markdown files.pdf
- PDF documents
Programming Languages
.py
- Python.js
,.ts
,.jsx
,.tsx
- JavaScript/TypeScript.java
- Java.c
,.cpp
,.cc
,.cxx
- C/C++.cs
- C#.go
- Go.rs
- Rust.php
- PHP.rb
- Ruby.scala
- Scala.swift
- Swift
Web Technologies
.html
,.htm
- HTML.css
,.scss
,.sass
- CSS and preprocessors
Shell Scripts
.sh
,.bash
,.zsh
,.fish
- Shell scripts
Data Formats
.json
- JSON.yaml
,.yml
- YAML.xml
- XML.sql
- SQL.ini
,.toml
- Configuration files
Note: Files with unsupported extensions are automatically skipped during processing.
API Endpoints
Web Interface
GET /
- Main web interfacePOST /upload-files/
- Upload individual filesPOST /upload-directory/
- Upload directory with extension filteringPOST /process-files/
- Process uploaded filesPOST /search/
- Search interfacePOST /delete-file/
- Delete uploaded file
JSON API
POST /api/search
- Programmatic search endpointGET /api/stats
- Get collection statisticsGET /health
- Health check endpoint
Search API Parameters
query
(required): Search query stringtop_k
(optional, default: 5): Number of results to return (1-50)search_type
(optional, default: "hybrid"): "hybrid", "semantic", or "keyword"use_reranker
(optional, default: true): Enable/disable result rerankingexpand_query
(optional, default: false): Enable/disable query expansion
MCP Integration
The project includes a separate MCP (Machine Comprehension Platform) integration service located in the mcp-rag-service/
directory. This service provides:
- RAG Client: Python client for interacting with the RAG server
- Vector Analytics: Advanced analysis capabilities including clustering, outlier detection, and similarity matrices
- MCP Server: Integration with MCP-compatible tools
MCP Examples
The mcp-rag-service/examples/
directory contains working examples:
upload_example.py
- Demonstrates file upload functionalitysearch_example.py
- Shows semantic search with similarity thresholdsanalysis_example.py
- Comprehensive vector analysis examples
To run the examples:
cd mcp-rag-service/examples
python upload_example.py
python search_example.py
python analysis_example.py
Project Structure
.
├── app/
│ ├── main.py # FastAPI application, routes, and API endpoints
│ └── services.py # Business logic (file processing, chunking, embeddings, DB)
├── mcp-rag-service/ # MCP integration service
│ ├── src/
│ │ ├── rag_client.py # RAG server client
│ │ ├── rag_mcp_server.py # MCP server implementation
│ │ ├── vector_operations.py # Advanced vector analytics
│ │ └── utils.py # Utility functions
│ ├── examples/ # Working examples
│ └── pyproject.toml
├── templates/
│ └── index.html # Jinja2 template for the UI
├── uploads/ # Directory for file uploads (mounted as a volume)
├── data/ # Directory for DuckDB database (mounted as a volume)
├── .dockerignore # Specifies files to ignore in Docker build context
├── .gitignore # Specifies files to ignore for Git
├── Dockerfile # Docker build instructions with uv and multi-stage builds
├── requirements-base.txt # Base Python dependencies
├── requirements-cpu.txt # CPU-only ML dependencies
├── requirements-ml.txt # Full ML dependencies (for GPU)
└── README.md # This file
Configuration
- Embedding Models: The primary and fallback models are defined as constants in
app/services.py
. - Chunking: Chunk size and overlap can be adjusted via the
CHUNK_SIZE
andCHUNK_OVERLAP
environment variables. The defaults are 700 and 100, respectively. - Database Path: The path to the DuckDB file is configured in
app/services.py
. - Search Features: The UI allows for advanced search configuration:
- Search Type: Choose between
Hybrid
(Semantic + Keyword),Semantic
-only, orKeyword
-only (BM25) search. - Reranking: A Cross-Encoder model can be used to rerank the top search results for higher accuracy. This can be toggled in the UI.
- Query Expansion: Automatically expand your query with relevant terms found from an initial search. This can be toggled in the UI.
- Search Type: Choose between
- Processing Features:
- TF-IDF Keywords: When processing files, you can choose to generate and attach relevant keywords to each chunk's metadata using TF-IDF. This can improve keyword-based searches.
Error Handling
- Unsupported Files: Files with unsupported extensions are automatically skipped during upload and processing.
- Empty Files: Empty or unreadable files are automatically removed from the uploads directory.
- Processing Errors: Individual file processing errors are logged but don't stop the overall process.
- API Errors: All API endpoints return structured error responses with appropriate HTTP status codes.
Known Limitations
- File Size: Very large files may cause memory issues during processing.
- Concurrent Users: The current implementation is designed for single-user scenarios.
- File Formats: Only text-based files are supported. Binary files (images, videos, etc.) are not supported.
- Language Support: While the embedding model is multilingual, chunking strategies are optimized for English and common programming languages.
Roadmap & Future Plans
Planned Features
- GraphRAG Integration: Advanced graph-based retrieval and reasoning capabilities
- Multi-user Support: User authentication and isolated document collections
- Real-time Processing: WebSocket support for real-time processing updates
- Advanced Analytics: More sophisticated vector analysis and visualization tools
- Plugin System: Extensible architecture for custom processors and analyzers
- Performance Optimization: Caching, indexing improvements, and distributed processing
GraphRAG Implementation
GraphRAG (Graph-based Retrieval-Augmented Generation) is planned as a major enhancement that will provide:
- Knowledge Graph Construction: Automatic extraction of entities and relationships
- Graph-based Retrieval: Enhanced search using graph traversal and reasoning
- Multi-hop Reasoning: Complex queries that require multiple reasoning steps
- Contextual Understanding: Better understanding of document relationships and hierarchies
This feature is currently in the planning phase and will be implemented as a separate module that can be optionally enabled.
Troubleshooting
Common Issues
-
Docker Build Fails: Try the CPU-only build for faster, more reliable builds:
docker build --build-arg USE_CPU_ONLY=true -t rag-duckdb-server-cpu .
-
Memory Issues: For large document collections, consider:
- Using the CPU-only build (smaller memory footprint)
- Processing files in smaller batches
- Increasing Docker memory limits
-
Model Loading Issues: The system automatically falls back to a smaller model if the primary model fails to load.
-
Database Issues: The DuckDB database is automatically created on first run. If you encounter database errors, you can delete the
data/
directory to start fresh.
Health Check
Use the health check endpoint to monitor service status:
curl http://localhost:8000/health
This returns service status, model loading state, and database connection information.
Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
License
This project is licensed under the MIT License - see the LICENSE file for details.