semantic-code-mcp

MCP server that provides semantic code search for Claude Code. Instead of iterative grep/glob, it indexes your codebase with embeddings and returns ranked results by meaning.

Supports Python, Rust, and Markdown — more languages planned.

How It Works

Claude Code ──(MCP/STDIO)──▶ semantic-code-mcp server
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
              AST Chunker      Embedder        LanceDB
             (tree-sitter)  (sentence-trans)  (vectors)

Chunking — tree-sitter parses source files into functions, classes, methods, structs, traits, markdown sections, etc.
Embedding — sentence-transformers encodes each chunk (all-MiniLM-L6-v2, 384d)
Storage — vectors stored in LanceDB (embedded, like SQLite)
Search — hybrid semantic + keyword search with recency boosting

Indexing is incremental (mtime-based) and uses git ls-files for fast file discovery. The embedding model loads lazily on first query.

Installation

macOS / Windows

PyPI ships CPU-only torch on these platforms, so no extra flags are needed (~1.7GB install).

uvx semantic-code-mcp

Claude Code integration:

claude mcp add --scope user semantic-code -- uvx semantic-code-mcp

Linux

[!IMPORTANT] Without the --index flag, PyPI installs CUDA-bundled torch (~3.5GB). Unless you need GPU acceleration (you don't — embeddings run on CPU), use the command below to get the CPU-only build (~1.7GB).

uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcp

Claude Code integration:

claude mcp add --scope user semantic-code -- \
  uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcp

Claude Desktop / other MCP clients (JSON config)

{
  "mcpServers": {
    "semantic-code": {
      "command": "uvx",
      "args": ["--index", "pytorch-cpu=https://download.pytorch.org/whl/cpu", "semantic-code-mcp"]
    }
  }
}

On macOS/Windows you can omit the --index and pytorch-cpu args.

Updating

uvx caches the installed version. To get the latest release:

uvx --upgrade semantic-code-mcp

Or pin a specific version in your MCP config:

claude mcp add --scope user semantic-code -- uvx semantic-code-mcp@0.2.0

MCP Tools

`search_code`

Search code by meaning, not just text matching. Auto-indexes on first search.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | query | str | required | Natural language description of what you're looking for | | project_path | str | required | Absolute path to the project root | | limit | int | 10 | Maximum number of results |

Returns ranked results with file_path, line_start, line_end, name, chunk_type, content, and score.

`index_codebase`

Index a codebase for semantic search. Only processes new and changed files unless force=True.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | project_path | str | required | Absolute path to the project root | | force | bool | False | Re-index all files regardless of changes |

`index_status`

Check indexing status for a project.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | project_path | str | required | Absolute path to the project root |

Returns is_indexed, files_count, and chunks_count.

Configuration

All settings are environment variables with the SEMANTIC_CODE_MCP_ prefix (via pydantic-settings):

| Variable | Default | Description | |----------|---------|-------------| | SEMANTIC_CODE_MCP_CACHE_DIR | ~/.cache/semantic-code-mcp | Where indexes are stored | | SEMANTIC_CODE_MCP_LOCAL_INDEX | false | Store index in .semantic-code/ within each project | | SEMANTIC_CODE_MCP_EMBEDDING_MODEL | all-MiniLM-L6-v2 | Sentence-transformers model | | SEMANTIC_CODE_MCP_DEBUG | false | Enable debug logging | | SEMANTIC_CODE_MCP_PROFILE | false | Enable pyinstrument profiling |

Pass environment variables via the env field in your MCP config:

{
  "mcpServers": {
    "semantic-code": {
      "command": "uvx",
      "args": ["semantic-code-mcp"],
      "env": {
        "SEMANTIC_CODE_MCP_DEBUG": "true",
        "SEMANTIC_CODE_MCP_LOCAL_INDEX": "true"
      }
    }
  }
}

Or with Claude Code CLI:

claude mcp add --scope user semantic-code \
  -e SEMANTIC_CODE_MCP_DEBUG=true \
  -e SEMANTIC_CODE_MCP_LOCAL_INDEX=true \
  -- uvx semantic-code-mcp

Tech Stack

| Component | Choice | Rationale | |-----------|--------|-----------| | MCP Framework | FastMCP | Python decorators, STDIO transport | | Embeddings | sentence-transformers | Local, no API costs, good quality | | Vector Store | LanceDB | Embedded (like SQLite), no server needed | | Chunking | tree-sitter | AST-based, respects code structure |

Development

uv sync                            # Install dependencies
uv run python -m semantic_code_mcp # Run server
uv run pytest                      # Run tests
uv run ruff check src/             # Lint
uv run ruff format src/            # Format

Pre-commit hooks enforce linting, formatting, type-checking (ty), security scanning (bandit), and Conventional Commits.

Releasing

Versions are derived from git tags automatically (hatch-vcs) — there's no hardcoded version in pyproject.toml.

git tag v0.2.0
git push origin v0.2.0

CI builds the package, publishes to PyPI, and creates a GitHub Release with auto-generated notes.

Adding a New Language

The chunker system is designed to make adding languages straightforward. Each language needs:

A tree-sitter grammar package (e.g. tree-sitter-javascript)
A chunker subclass that walks the AST and extracts meaningful chunks

Steps:

uv add tree-sitter-mylang

Create src/semantic_code_mcp/chunkers/mylang.py:

from enum import StrEnum, auto

import tree_sitter_mylang as tsmylang
from tree_sitter import Language, Node

from semantic_code_mcp.chunkers.base import BaseTreeSitterChunker
from semantic_code_mcp.models import Chunk, ChunkType


class NodeType(StrEnum):
    function_definition = auto()
    # ... other node types


class MyLangChunker(BaseTreeSitterChunker):
    language = Language(tsmylang.language())
    extensions = (".ml",)

    def _extract_chunks(self, root: Node, file_path: str, lines: list[str]) -> list[Chunk]:
        chunks = []
        for node in root.children:
            match node.type:
                case NodeType.function_definition:
                    name = node.child_by_field_name("name").text.decode()
                    chunks.append(self._make_chunk(node, file_path, lines, ChunkType.function, name))
                # ... other node types
        return chunks

from semantic_code_mcp.chunkers.mylang import MyLangChunker

def get_chunkers(self) -> list[BaseTreeSitterChunker]:
    return [PythonChunker(), RustChunker(), MarkdownChunker(), MyLangChunker()]

The CompositeChunker handles dispatch by file extension automatically. Use BaseTreeSitterChunker._make_chunk() for consistent chunk construction. See chunkers/python.py and chunkers/rust.py for complete examples.

Project Structure

src/semantic_code_mcp/chunkers/ — language chunkers (base.py, composite.py, python.py, rust.py, markdown.py)
src/semantic_code_mcp/services/ — IndexService (scan/chunk/index), SearchService (search + auto-index)
src/semantic_code_mcp/indexer.py — embed + store pipeline
docs/decisions/ — architecture decision records
TODO.md — epics and planning
CHANGELOG.md — completed work (Keep a Changelog format)
.claude/rules/ — context-specific coding rules for AI agents

License

MIT

MCP Servers