⚡ Developer-friendly hybrid-RAG toolkit merging Graphiti, Qdrant, mem0, LlamaIndex, and LangChain into one powerful engine.
Enhanced Memory Vector RAG
⚡ Developer-friendly hybrid-RAG toolkit merging Graphiti, Qdrant, mem0, LlamaIndex, and LangChain into one powerful engine.
This implementation creates a sophisticated knowledge retrieval system by integrating KAG methodologies with traditional RAG approaches. It seamlessly combines Graphiti's graph intelligence, Qdrant's vector capabilities, and mem0's memory persistence - all accessible through flexible LlamaIndex and LangChain interfaces for applications requiring both factual accuracy and contextual understanding.
Table of Contents
- Enhanced Memory Vector RAG
Overview
Enhanced Memory Vector RAG (EMVR) is a comprehensive framework that combines the strengths of multiple retrieval methodologies to create a more robust, accurate, and contextually aware knowledge system. By integrating graph-based Knowledge-Augmented Generation (KAG) with traditional vector-based Retrieval-Augmented Generation (RAG), EMVR provides superior performance in complex knowledge retrieval tasks.
The system leverages:
- Graphiti/Neo4j for structured knowledge representation and graph traversal
- Qdrant for efficient vector similarity search
- mem0 for persistent memory and context management
- LlamaIndex & LangChain for flexible orchestration and agent-based workflows
Features
- 🔄 Hybrid Retrieval System - Combines vector similarity search with graph-based knowledge retrieval
- 🧠 Persistent Memory - Maintains context and relationships across sessions
- 🔍 Multi-modal Search - Query across different data types and structures
- 🔗 Knowledge Graph Integration - Leverages structured relationships for improved context
- 🚀 Framework Flexibility - Works with both LlamaIndex and LangChain
- 📊 Extensible Architecture - Easy to customize and extend for specific use cases
- 🛠️ Developer-Friendly APIs - Simple interfaces for complex retrieval operations
- 📈 Performance Optimization - Efficient retrieval strategies for reduced latency
- 🐳 Docker Deployment - Containerized architecture for easy deployment
Architecture
EMVR implements a comprehensive layered architecture integrating multiple components for advanced retrieval:
Layered Architecture
graph TD
subgraph "Application Layer"
QueryInterface("Query Interfaces")
ResponseGen("Response Generation")
AgentWorkflows("Custom Agent Workflows")
MCP("Model Context Protocol (MCP)")
end
subgraph "Orchestration Layer"
HybridManager("Hybrid Retrieval Manager")
ContextFusion("Context Fusion Engine")
GraphTraversal("Knowledge Graph Traversal")
LangGraph("LangGraph Orchestration")
end
subgraph "Integration Layer"
LlamaIndexConn("LlamaIndex Connectors")
LangChainComp("LangChain Components")
FastEmbed("FastEmbed Integration")
FastMCP("FastMCP Framework")
end
subgraph "Storage Layer"
Qdrant("Vector Database (Qdrant)")
Neo4j("Graph Database (Neo4j/Graphiti)")
Mem0("Memory System (mem0)")
Supabase("Metadata Storage (Supabase)")
end
QueryInterface --> HybridManager
ResponseGen --> ContextFusion
AgentWorkflows --> GraphTraversal
AgentWorkflows --> HybridManager
MCP --> FastMCP
LangGraph --> LangChainComp
HybridManager --> LlamaIndexConn
HybridManager --> LangChainComp
ContextFusion --> LlamaIndexConn
ContextFusion --> LangChainComp
GraphTraversal --> LlamaIndexConn
FastMCP --> LlamaIndexConn
FastEmbed -.-> Qdrant
LlamaIndexConn --> Qdrant
LlamaIndexConn --> Neo4j
LlamaIndexConn --> Mem0
LlamaIndexConn --> Supabase
LangChainComp --> Qdrant
LangChainComp --> Neo4j
LangChainComp --> Mem0
LangChainComp --> Supabase
Comprehensive System Architecture
graph TB
User([User]) <--> ClaudeCode["Claude Code & MCP Tools"]
ClaudeCode <--> CustomMCP["Custom 'memory' MCP Server\n(FastMCP Framework)"]
ClaudeCode <--> ExternalMCP["External MCP Servers\n(tavily, firecrawl, context7, etc.)"]
subgraph "Agent System"
LangGraph["LangGraph\n(Agent Orchestration)"]
LangChain["LangChain\n(Agent Tools & Planning)"]
Agents["Specialized Agents\n(Supervisor-Worker Pattern)"]
LangGraph --> LangChain
LangGraph --> Agents
end
subgraph "RAG Framework"
LlamaIndex["LlamaIndex\n(Core RAG Framework)"]
QueryEngines["Query Engines\n(Vector, Graph, Hybrid)"]
Retrievers["Specialized Retrievers"]
DataLoaders["Data Loaders & Indexers"]
LlamaIndex --> QueryEngines
LlamaIndex --> Retrievers
LlamaIndex --> DataLoaders
end
subgraph "Memory & Storage"
Qdrant[(Qdrant\nVector Store)]
Neo4j[(Neo4j\nGraph Database)]
Mem0["Mem0\n(Memory Interface)"]
Graphiti["Graphiti\n(Graph Interface)"]
Supabase[(Supabase\nMetadata & Documents)]
S3[(AWS S3\nOriginal Documents)]
Mem0 -.-> Qdrant
Graphiti -.-> Neo4j
end
subgraph "Embedding & Ingestion"
FastEmbed["FastEmbed\nEmbedding Generation"]
WebCrawlers["Web Crawlers\n(Crawl4AI, Firecrawl)"]
ConnectorAPIs["Connector APIs\n(GitHub, Reddit, etc.)"]
FastEmbed --> Qdrant
WebCrawlers --> DataLoaders
ConnectorAPIs --> DataLoaders
end
CustomMCP <--> LangGraph
CustomMCP <--> LlamaIndex
LangGraph <--> LlamaIndex
LlamaIndex <--> Qdrant
LlamaIndex <--> Neo4j
LlamaIndex <--> Mem0
LlamaIndex <--> Graphiti
LlamaIndex <--> Supabase
DataLoaders --> S3
DataLoaders --> Supabase
DataLoaders --> Qdrant
DataLoaders --> Neo4j
style CustomMCP fill:#f9d6ff,stroke:#9333ea,stroke-width:2px
style LlamaIndex fill:#d1fae5,stroke:#059669,stroke-width:2px
style LangGraph fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
style Qdrant fill:#fee2e2,stroke:#ef4444,stroke-width:2px
style Neo4j fill:#ffedd5,stroke:#f97316,stroke-width:2px
Data Flow
flowchart LR
classDef userInteraction fill:#f9d6ff,stroke:#9333ea,stroke-width:2px
classDef retrieval fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
classDef processing fill:#d1fae5,stroke:#059669,stroke-width:2px
classDef storage fill:#fee2e2,stroke:#ef4444,stroke-width:2px
classDef fusion fill:#ffedd5,stroke:#f97316,stroke-width:2px
Input("User Query/Task") --> ClaudeCode("Claude Code\nMCP Interface")
ClaudeCode --> MemoryMCP("Custom 'memory'\nMCP Server")
MemoryMCP --> Agent("Agent System\n(LangChain/LangGraph)")
Agent --> VR("Vector Retrieval\n(Qdrant via LlamaIndex)")
Agent --> GR("Graph Retrieval\n(Neo4j/Graphiti via LlamaIndex)")
Agent --> MR("Memory Retrieval\n(mem0)")
Agent --> WS("Web Search\n(Tavily/Firecrawl)")
VR --> CF("Context Fusion\n(LlamaIndex Orchestration)")
GR --> CF
MR --> CF
WS --> CF
CF --> QP("Query Planning\n(LangGraph)")
QP --> RT("Response Templates")
CF --> LLM("Large Language Model")
RT --> LLM
LLM --> Response("Enhanced Response")
Response --> MemUpdate("Memory Update\n(mem0)")
Response --> KGUpdate("Knowledge Graph Update\n(Neo4j)")
Response --> MetaUpdate("Metadata Update\n(Supabase)")
Response --> ClaudeCode
ClaudeCode --> User([User])
class Input,ClaudeCode,User userInteraction
class VR,GR,MR,WS retrieval
class QP,RT,LLM processing
class MemUpdate,KGUpdate,MetaUpdate storage
class CF fusion
MCP Interaction Flow
sequenceDiagram
participant User
participant Claude as Claude Code
participant Memory as custom 'memory' MCP
participant External as External MCP Servers
participant LlamaIdx as LlamaIndex
participant Storage as Storage Systems
User->>Claude: Query or Task
Claude->>Memory: memory.read_graph()
Memory->>LlamaIdx: Query through LlamaIndex
LlamaIdx->>Storage: Fetch from Qdrant/Neo4j/Supabase
Storage-->>LlamaIdx: Return relevant data
LlamaIdx-->>Memory: Process & return results
Memory-->>Claude: Return graph state
Claude->>External: context7.get_library_docs()
External-->>Claude: Return documentation
Note over Claude,Memory: Agent Planning & Execution
Claude->>Memory: Execute retrieval/update
Memory->>LlamaIdx: Orchestrate operations
LlamaIdx->>Storage: Execute operations
Storage-->>LlamaIdx: Return operation results
LlamaIdx-->>Memory: Process & return results
Memory-->>Claude: Return operation status/results
Claude->>Memory: memory.add_observations()
Memory->>Storage: Update memory state
Claude-->>User: Deliver response/results
Getting Started
Prerequisites
- Python 3.11+
- Docker (recommended for Neo4j, Qdrant, and Supabase)
uvfor Python package management- Basic understanding of RAG systems
Installation
Local Development
# Clone the repository
git clone https://github.com/BjornMelin/enhanced-mem-vector-rag.git
cd enhanced-mem-vector-rag
# Install dependencies using uv
uv pip install -r requirements.txt
Docker Deployment
# Navigate to deployment directory
cd emvr/deployment
# Setup environment
./setup_local.sh
# Start services
docker compose up -d
Quick Start
from emvr import EmvrSystem
# Initialize the system
system = EmvrSystem()
# Load data
system.load_documents("path/to/documents")
system.build_knowledge_graph()
# Query the system
response = system.query("What is the relationship between X and Y?")
print(response)
Components
Memory System (mem0)
The memory component leverages mem0 to maintain persistent context across queries and sessions. This allows the system to:
- Remember previous interactions
- Build cumulative knowledge
- Maintain entity relationships
- Support temporal reasoning
graph LR
Query("User Query") --> Memory("mem0 Memory System")
Memory --> Scoring("Relevance Scoring")
Memory --> Personalization("Personalization Layer")
Memory --> Context("Contextual History")
Scoring --> Retrieval("Enhanced Retrieval")
Personalization --> Retrieval
Context --> Retrieval
Retrieval --> LLM("Large Language Model")
LLM --> Response("Enhanced Response")
Response --> Memory
Graph Knowledge Base (Graphiti/Neo4j)
The graph component uses Graphiti with Neo4j to:
- Store structured relationships between entities
- Enable complex traversal queries
- Support reasoning about interconnected concepts
- Provide explicit knowledge paths
graph TD
subgraph "Knowledge Graph (Neo4j/Graphiti)"
Entity1("Entity A")
Entity2("Entity B")
Entity3("Entity C")
Entity4("Entity D")
Entity1 -- "relates_to" --> Entity2
Entity2 -- "depends_on" --> Entity3
Entity1 -- "creates" --> Entity4
Entity3 -- "part_of" --> Entity4
end
Query("Knowledge Query") --> GraphTraversal("Graph Traversal (Graphiti)")
GraphTraversal --> Neo4j("Neo4j Database")
Neo4j --> Results("Structured Results")
Results --> LLM("LLM for Reasoning")
Vector Storage (Qdrant)
The vector component uses Qdrant to:
- Store and retrieve document embeddings
- Perform efficient similarity search
- Support semantic matching
- Handle large-scale vector operations
graph TD
Documents["Input Documents"] --> TextChunker["Text Chunker"]
TextChunker --> EmbeddingGen["Embedding Generation"]
EmbeddingGen --> VectorDB["Qdrant Vector Database"]
Query["User Query"] --> QueryEmbed["Query Embedding"]
QueryEmbed --> SearchVec["Vector Search"]
SearchVec --> VectorDB
VectorDB --> TopMatches["Top K Matches"]
TopMatches --> Reranker["Reranker"]
Reranker --> ContextGen["Context Generation"]
Framework Integration (LlamaIndex & LangChain)
EMVR integrates with both major RAG frameworks:
- LlamaIndex - For advanced indexing and retrieval operations
- LangChain - For agent-based workflows and tool integration
graph TD
subgraph "LlamaIndex Integration"
Docs[("Documents")] --> Loaders["Data Loaders"]
Loaders --> Indexing["Indexing Pipelines"]
Indexing --> QueryEngines["Query Engines"]
QueryEngines --> RetFramework["Retrieval Framework"]
end
subgraph "LangChain Integration"
Agents["Agent Framework"] --> Planning["Planning Modules"]
Planning --> Tools["Tool Integration"]
Tools --> Memory["Memory Components"]
Memory --> Callbacks["Callback Handlers"]
end
RetFramework <--> Tools
QueryEngines <--> Agents
Usage Examples
Examples are coming soon. They will demonstrate:
- Basic RAG workflows
- Knowledge graph integration
- Multi-hop reasoning
- Custom retrieval strategies
- Agent-based applications
Configuration
EMVR can be configured through:
- Configuration files
- Environment variables
- Programmatic settings
Detailed configuration options will be provided in the upcoming documentation.
Benchmarks
Performance benchmarks comparing EMVR to traditional RAG systems will be available soon.
Roadmap
- [x] Initial release with core functionality
- [x] Basic documentation
- [x] Agent orchestration implementation
- [x] UI implementation with Chainlit
- [x] Docker containerization and deployment
- [ ] Comprehensive documentation
- [ ] Performance benchmarks
- [ ] Advanced examples
- [ ] Cloud deployment guides
- [ ] Additional vector database integrations
- [ ] Custom agent templates
Contributing
Contributions are welcome! Please see the CONTRIBUTING.md file for guidelines.
How to Cite
If you use EMVR in your research, please cite:
@software{emvr2025,
author = {Melin, Bjorn},
title = {Enhanced Memory Vector RAG: A Hybrid Retrieval Framework},
year = {2025},
url = {https://github.com/BjornMelin/enhanced-mem-vector-rag},
version = {0.1.0}
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgements
- Graphiti for Neo4j integration
- Qdrant for vector database capabilities
- mem0 for memory systems
- LlamaIndex for indexing frameworks
- LangChain for agent orchestration
Custom MCP Server Implementation
This project implements a custom memory MCP server using the FastMCP framework that serves as the central interface between Claude Code and the system's backend components:
flowchart TD
classDef mcp fill:#f9d6ff,stroke:#9333ea,stroke-width:2px
classDef frameworks fill:#d1fae5,stroke:#059669,stroke-width:2px
classDef storage fill:#fee2e2,stroke:#ef4444,stroke-width:2px
Claude([Claude Code]) --> MCP["Custom 'memory' MCP Server\n(FastMCP Framework)"]
subgraph "MCP Endpoints"
SearchHybrid["/search.hybrid"]
GraphQuery["/graph.query"]
MemoryOps["/memory.*"]
RulesValidate["/rules.validate"]
IngestOps["/ingest.*"]
end
MCP --> SearchHybrid
MCP --> GraphQuery
MCP --> MemoryOps
MCP --> RulesValidate
MCP --> IngestOps
SearchHybrid --> LlamaIndex["LlamaIndex\nRAG Orchestration"]
GraphQuery --> LlamaIndex
MemoryOps --> LlamaIndex
RulesValidate --> APOC["Neo4j APOC\nRules Engine"]
IngestOps --> LlamaIndex
LlamaIndex --> Qdrant[(Qdrant)]
LlamaIndex --> Neo4j[(Neo4j)]
LlamaIndex --> Supabase[(Supabase)]
APOC --> Neo4j
Mem0["Mem0 SDK"] --> Qdrant
Graphiti["Graphiti Client"] --> Neo4j
class MCP,SearchHybrid,GraphQuery,MemoryOps,RulesValidate,IngestOps mcp
class LlamaIndex,APOC,Mem0,Graphiti frameworks
class Qdrant,Neo4j,Supabase storage
Key MCP Endpoints
| Endpoint | Description | Implementation |
| ----------------- | ----------------------------------------------------- | ------------------------------------------------------------------------------------ |
| /search.hybrid | Performs hybrid search across vector and graph stores | Uses LlamaIndex for orchestrating hybrid search across Qdrant and Neo4j |
| /graph.query | Executes knowledge graph queries | Translates natural language to Cypher using LlamaIndex's KnowledgeGraphQueryEngine |
| /memory.* | Operations for memory management | Includes CRUD operations for graph entities and observations |
| /rules.validate | Validates operations against defined rules | Uses Neo4j APOC for rule enforcement |
| /ingest.* | Handles data ingestion from various sources | Utilizes LlamaIndex data loaders and FastEmbed for embedding generation |
Claude Code Development
This project provides a detailed development guide for Claude Code users. The guide includes:
- Project overview and technical architecture
- Development workflow and memory protocol
- Coding standards and practices
- Git workflow
- MCP server documentation and usage
- Key architectural components and their roles
For Claude Code development, please refer to CLAUDE.md for comprehensive guidelines.
Deployment
The project includes a complete deployment system using Docker Compose:
Docker Components
- MCP Server: FastAPI server implementing the Model Context Protocol
- Chainlit UI: Web interface for user interaction
- Qdrant: Vector database for semantic search
- Neo4j: Graph database for knowledge graphs
- Supabase: PostgreSQL for structured data and metadata
- Grafana/Prometheus: Monitoring and observability
Deployment Options
Local Deployment
# Navigate to deployment directory
cd emvr/deployment
# Set up environment
./setup_local.sh
# Start services using docker-compose
docker compose up -d
Using Makefile
cd emvr/deployment
make setup # Run setup script
make up # Start all services
Security
The deployment includes comprehensive security features:
- JWT-based authentication
- Role-Based Access Control (RBAC)
- Secure environment variable management
- Container-based isolation
Monitoring & Observability
Access system metrics and logs through:
- Grafana dashboard: http://localhost:3000
- Prometheus metrics: http://localhost:9090
Backup & Restore
The system includes scripts for data backup and restoration:
# Create backup
./scripts/backup.sh
# Restore from backup
./scripts/restore.sh ./backups/emvr_backup_20250506_120000.tar.gz
For detailed deployment instructions, see the deployment README.