MCP server by vrppaul
semantic-code-mcp
MCP server that provides semantic code search for Claude Code. Instead of iterative grep/glob, it indexes your codebase with embeddings and returns ranked results by meaning.
Supports Python, Rust, and Markdown — more languages planned.
How It Works
Claude Code ──(MCP/STDIO)──▶ semantic-code-mcp server
│
┌───────────────┼───────────────┐
▼ ▼ ▼
AST Chunker Embedder LanceDB
(tree-sitter) (sentence-trans) (vectors)
- Chunking — tree-sitter parses source files into functions, classes, methods, structs, traits, markdown sections, etc.
- Embedding — sentence-transformers encodes each chunk (all-MiniLM-L6-v2, 384d)
- Storage — vectors stored in LanceDB (embedded, like SQLite)
- Search — hybrid semantic + keyword search with recency boosting
Indexing is incremental (mtime-based) and uses git ls-files for fast file discovery. The embedding model loads lazily on first query.
Installation
macOS / Windows
PyPI ships CPU-only torch on these platforms, so no extra flags are needed (~1.7GB install).
uvx semantic-code-mcp
Claude Code integration:
claude mcp add --scope user semantic-code -- uvx semantic-code-mcp
Linux
[!IMPORTANT] Without the
--indexflag, PyPI installs CUDA-bundled torch (~3.5GB). Unless you need GPU acceleration (you don't — embeddings run on CPU), use the command below to get the CPU-only build (~1.7GB).
uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcp
Claude Code integration:
claude mcp add --scope user semantic-code -- \
uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcp
Claude Desktop / other MCP clients (JSON config)
{
"mcpServers": {
"semantic-code": {
"command": "uvx",
"args": ["--index", "pytorch-cpu=https://download.pytorch.org/whl/cpu", "semantic-code-mcp"]
}
}
}
On macOS/Windows you can omit the --index and pytorch-cpu args.
Updating
uvx caches the installed version. To get the latest release:
uvx --upgrade semantic-code-mcp
Or pin a specific version in your MCP config:
claude mcp add --scope user semantic-code -- uvx semantic-code-mcp@0.2.0
MCP Tools
search_code
Search code by meaning, not just text matching. Auto-indexes on first search.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| query | str | required | Natural language description of what you're looking for |
| project_path | str | required | Absolute path to the project root |
| limit | int | 10 | Maximum number of results |
Returns ranked results with file_path, line_start, line_end, name, chunk_type, content, and score.
index_codebase
Index a codebase for semantic search. Only processes new and changed files unless force=True.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| project_path | str | required | Absolute path to the project root |
| force | bool | False | Re-index all files regardless of changes |
index_status
Check indexing status for a project.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| project_path | str | required | Absolute path to the project root |
Returns is_indexed, files_count, and chunks_count.
Configuration
All settings are environment variables with the SEMANTIC_CODE_MCP_ prefix (via pydantic-settings):
| Variable | Default | Description |
|----------|---------|-------------|
| SEMANTIC_CODE_MCP_CACHE_DIR | ~/.cache/semantic-code-mcp | Where indexes are stored |
| SEMANTIC_CODE_MCP_LOCAL_INDEX | false | Store index in .semantic-code/ within each project |
| SEMANTIC_CODE_MCP_EMBEDDING_MODEL | all-MiniLM-L6-v2 | Sentence-transformers model |
| SEMANTIC_CODE_MCP_DEBUG | false | Enable debug logging |
| SEMANTIC_CODE_MCP_PROFILE | false | Enable pyinstrument profiling |
Pass environment variables via the env field in your MCP config:
{
"mcpServers": {
"semantic-code": {
"command": "uvx",
"args": ["semantic-code-mcp"],
"env": {
"SEMANTIC_CODE_MCP_DEBUG": "true",
"SEMANTIC_CODE_MCP_LOCAL_INDEX": "true"
}
}
}
}
Or with Claude Code CLI:
claude mcp add --scope user semantic-code \
-e SEMANTIC_CODE_MCP_DEBUG=true \
-e SEMANTIC_CODE_MCP_LOCAL_INDEX=true \
-- uvx semantic-code-mcp
Tech Stack
| Component | Choice | Rationale | |-----------|--------|-----------| | MCP Framework | FastMCP | Python decorators, STDIO transport | | Embeddings | sentence-transformers | Local, no API costs, good quality | | Vector Store | LanceDB | Embedded (like SQLite), no server needed | | Chunking | tree-sitter | AST-based, respects code structure |
Development
uv sync # Install dependencies
uv run python -m semantic_code_mcp # Run server
uv run pytest # Run tests
uv run ruff check src/ # Lint
uv run ruff format src/ # Format
Pre-commit hooks enforce linting, formatting, type-checking (ty), security scanning (bandit), and Conventional Commits.
Releasing
Versions are derived from git tags automatically (hatch-vcs) — there's no hardcoded version in pyproject.toml.
git tag v0.2.0
git push origin v0.2.0
CI builds the package, publishes to PyPI, and creates a GitHub Release with auto-generated notes.
Adding a New Language
The chunker system is designed to make adding languages straightforward. Each language needs:
- A tree-sitter grammar package (e.g.
tree-sitter-javascript) - A chunker subclass that walks the AST and extracts meaningful chunks
Steps:
uv add tree-sitter-mylang
Create src/semantic_code_mcp/chunkers/mylang.py:
from enum import StrEnum, auto
import tree_sitter_mylang as tsmylang
from tree_sitter import Language, Node
from semantic_code_mcp.chunkers.base import BaseTreeSitterChunker
from semantic_code_mcp.models import Chunk, ChunkType
class NodeType(StrEnum):
function_definition = auto()
# ... other node types
class MyLangChunker(BaseTreeSitterChunker):
language = Language(tsmylang.language())
extensions = (".ml",)
def _extract_chunks(self, root: Node, file_path: str, lines: list[str]) -> list[Chunk]:
chunks = []
for node in root.children:
match node.type:
case NodeType.function_definition:
name = node.child_by_field_name("name").text.decode()
chunks.append(self._make_chunk(node, file_path, lines, ChunkType.function, name))
# ... other node types
return chunks
Register it in src/semantic_code_mcp/container.py:
from semantic_code_mcp.chunkers.mylang import MyLangChunker
def get_chunkers(self) -> list[BaseTreeSitterChunker]:
return [PythonChunker(), RustChunker(), MarkdownChunker(), MyLangChunker()]
The CompositeChunker handles dispatch by file extension automatically. Use BaseTreeSitterChunker._make_chunk() for consistent chunk construction. See chunkers/python.py and chunkers/rust.py for complete examples.
Project Structure
src/semantic_code_mcp/chunkers/— language chunkers (base.py,composite.py,python.py,rust.py,markdown.py)src/semantic_code_mcp/services/— IndexService (scan/chunk/index), SearchService (search + auto-index)src/semantic_code_mcp/indexer.py— embed + store pipelinedocs/decisions/— architecture decision recordsTODO.md— epics and planningCHANGELOG.md— completed work (Keep a Changelog format).claude/rules/— context-specific coding rules for AI agents
License
MIT