LLM Memory MCP — AI Agent Persistent Memory with Decay

An MCP server that gives any AI agent human-like persistent memory: it remembers important things, forgets trivial details over time, and consolidates related memories into long-term knowledge — just like human memory.

Architecture

Claude Desktop / Cursor  ──stdio──►  MCP Server (memory_mcp)
                                           │
                                    ┌──────┴──────┐
                                    │             │
                              SQLite DB    Sentence Transformers
                              + embeddings   (local, free)
                                    │
                              Anthropic API (consolidation only)

Memory Tiers

| Tier | Rule | Decay? | |------|------|--------| | Working | Last 10 interactions | No — always available | | Short-term | Last 30 days, importance < 8 | Yes — SM-2 decay | | Long-term | importance ≥ 8 or consolidated | Never decays |

SM-2 Decay + Importance

Memories use the SM-2 spaced repetition algorithm as the backbone:

Intervals: I(1)=1 day, I(2)=6 days, I(n)=I(n-1) × EF
Ease factor: EF' = EF + (0.1 - (5-q)(0.08 + (5-q)(0.02)), min 1.3
Importance multiplier: importance 1 → 0.5× resistance, importance 10 → 2.0× resistance

Retrieval score = semantic_similarity × retrieval_weight × tier_boost × importance_boost

MCP Tools

| Tool | Description | |------|-------------| | remember(content, importance_score) | Store memory (importance 1-10) | | recall(query) | Retrieve by semantic + decay-weighted relevance | | forget_old_memories() | Soft-delete decayed short-term memories | | consolidate_memories() | LLM-merge related short-term → long-term | | get_memory_summary() | Overview of all tiers, decay state, consolidations |

Quick Start

1. Install

cd llm_memory_mcp
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
# Edit .env — set ANTHROPIC_API_KEY for LLM consolidation (optional for basic use)

2. Seed demo data

python scripts/seed_memories.py

3. Run dashboard

uvicorn dashboard.app:app --reload --port 8787

Open http://localhost:8787

4. Connect to Claude Desktop

Copy claude_desktop_config.example.json and edit paths:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "memory": {
      "command": "/Users/YOU/llm_memory_mcp/.venv/bin/python",
      "args": ["-m", "memory_mcp.server"],
      "env": {
        "MEMORY_DB_PATH": "/Users/YOU/llm_memory_mcp/data/memories.db",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

Restart Claude Desktop. You should see 5 memory tools available.

5. Connect to Cursor

Add to Cursor MCP settings (.cursor/mcp.json in project or global settings):

{
  "mcpServers": {
    "memory": {
      "command": "/Users/YOU/llm_memory_mcp/.venv/bin/python",
      "args": ["-m", "memory_mcp.server"],
      "env": {
        "MEMORY_DB_PATH": "/Users/YOU/llm_memory_mcp/data/memories.db"
      }
    }
  }
}

Multi-Session Demo Script

Use this to prove cross-session memory in Claude Desktop:

| Session | What to say | Expected | |---------|-------------|----------| | Day 1 | "Use remember to store: I prefer TypeScript, importance 6" | Stored short-term | | Day 1 | "Use remember to store: My deadline is June 15, importance 9" | Stored long-term | | Day 2 (new chat) | "Use recall to find my language preference" | Returns TypeScript | | Day 2 | "Use get_memory_summary" | Shows tier counts | | Day 7 | "Use recall for my deadline" | Still returns June 15 | | Day 7 | "Use consolidate_memories" | Merges related Rust memories | | Day 30 | "Use forget_old_memories then recall toast" | Trivial memory forgotten |

Testing

# Unit tests (uses mocked embeddings — fast)
pytest tests/ -v

# 20-scenario eval suite
python tests/eval/run_eval.py -v

# Simulate 30 days of decay
python scripts/simulate_time.py --days 30
python -c "
from memory_mcp.memory_service import MemoryService
print(MemoryService().forget_old_memories())
"

Eval Scenarios (20)

| # | Category | Tests | |---|----------|-------| | 1 | High importance name | Long-term retention | | 2 | Low importance trivial | Decay + forget | | 3 | Recall reinforcement | SM-2 boost | | 4 | Semantic paraphrase | Embedding quality | | 5 | Consolidation merge | LLM synthesis | | 6 | No hallucination | Consolidation faithfulness | | 7 | False recall prevention | Unrelated query → empty | | 8 | Working memory | Last 10 interactions | | 9 | Importance 10 vs 1 | Decay rate difference | | 10 | Long-term never forgotten | forget_old skips | | 11 | Cross-session persistence | DB reload | | 12 | Contradiction merge | "Previously X, now Y" | | 13 | Recall boosts weight | Reinforcement | | 14 | 30-day expire | Short-term window | | 15 | Summary accuracy | Counts match | | 16 | Duplicate remember | Multiple entries OK | | 17 | Empty query | Graceful handling | | 18 | Consolidated sources hidden | Sources marked consolidated | | 19 | Multi-topic recall | Both topics found | | 20 | End-to-end conversation | Full workflow |

Project Structure

llm_memory_mcp/
├── src/memory_mcp/
│   ├── server.py           # MCP tool definitions
│   ├── memory_service.py   # Orchestration layer
│   ├── sm2.py              # SM-2 decay engine
│   ├── recall.py           # Hybrid retrieval
│   ├── consolidation.py    # LLM merge pipeline
│   ├── db.py               # SQLite persistence
│   └── ...
├── dashboard/              # Visualization UI
├── tests/                  # Unit + integration tests
├── tests/eval/             # 20-scenario eval suite
└── scripts/                # Seed + time simulation

Configuration

| Env Var | Default | Description | |---------|---------|-------------| | MEMORY_DB_PATH | ./data/memories.db | SQLite database path | | ANTHROPIC_API_KEY | — | Required for LLM consolidation | | EMBEDDING_MODEL | all-MiniLM-L6-v2 | Local embedding model | | DECAY_THRESHOLD | 0.10 | Weight below which memories are forgotten | | SHORT_TERM_DAYS | 30 | Short-term memory window | | CONSOLIDATION_SIM_THRESHOLD | 0.75 | Min similarity to cluster |

Notes

First remember() call is slow (~5s) while the embedding model loads. Subsequent calls are fast.
Consolidation without API key falls back to simple concatenation (still functional for testing).
Never use print() in the MCP server — it breaks stdio. Logging goes to stderr.
Use absolute paths in Claude Desktop config.

License

MIT

MCP Servers