custom-mcp-server

MCP turns tool integration from an N×M problem into an N+M problem.

This repo implements a custom MCP server with all three primitives — tools, resources, prompts — and demonstrates why the protocol matters. Built as Day 5 of an AI Solutions Architect learning journey.

What this demonstrates

A working MCP server in ~60 lines of Python using FastMCP
All three MCP primitives: tools, resources, prompts
Pydantic-typed responses with precise JSON Schema generation
Both stdio and Streamable HTTP transports — same code, one line changed
Cross-platform file handling via pathlib
Integration with the MCP Inspector and (where org policy allows) Claude Desktop
Honest documentation of an enterprise governance constraint encountered during development

The architectural punchline

Before MCP, integrating N AI tools with M LLMs required N×M custom adapters. Each new tool meant writing client code for every LLM that wanted to use it. Each new LLM meant re-implementing every existing tool.

MCP collapses this to N+M: every LLM speaks MCP, every tool exposes MCP. One protocol bridge instead of an integration matrix.

This is the same family of reductions that made USB, TCP/IP, and LSP foundational standards:

| Standard | What it unified | Reduction | |---|---|---| | USB | Peripheral device connectivity | N×M -> N+M | | TCP/IP | Application network connectivity | N×M -> N+M | | LSP | IDE language support | N×M -> N+M | | MCP | AI agent tool integration | N×M -> N+M |

Protocols win when they remove integration cost. MCP went from Anthropic-internal in November 2024 to industry standard adopted by OpenAI, Google, Microsoft, and AWS by March 2026 — under 18 months. The N+M reduction is why.

Architecture

custom-mcp-server/ ├── server.py # ~60 lines: all 3 primitives ├── notes/ │ ├── architecture-patterns.md │ ├── day-04-bugs.md │ └── day-05-mcp.md ├── pyproject.toml # uv-managed └── README.md # this file

The three MCP primitives

Each primitive answers a different question about who drives the interaction:

| Primitive | What it does | Who decides to call it | Example in this repo | |---|---|---|---| | Tool | Executes an action | The LLM (autonomous) | search_my_notes(query) | | Resource | Exposes read-only data | The user / host (attached) | notes://{filename} | | Prompt | Provides a message template | The user (slash command) | reflect_on_day(day_number) |

This trinity is the deeper design lesson: MCP separates capability from invocation authority. The same underlying file-read logic appears as both a tool (read_note) and a resource (notes://{filename}) because the LLM and the user need different access patterns. Production MCP servers commonly do this "dual exposure."

Transports — stdio vs Streamable HTTP

The same server runs over two transports with a one-line change:

# Local subprocess (Claude Desktop spawns this)
mcp.run()

# Remote HTTP service (Claude Desktop / any client connects to URL)
mcp.run(transport="streamable-http")

| Aspect | stdio | Streamable HTTP | |---|---|---| | Transport | stdin/stdout pipes | HTTP + Server-Sent Events | | Process model | Subprocess of the host | Standalone service | | Network scope | Same machine only | Any reachable host | | Authentication | Implicit (parent-child trust) | OAuth 2.1 required in production | | Use case | Local dev tools, personal scripts | Remote services, multi-tenant |

The protocol payload is identical across both. Same JSON-RPC 2.0, same initialize handshake, same tools/list and resources/templates/list responses. Only the envelope differs.

This is the architect's deep insight: good protocols separate the wire format from the transport layer. MCP passes this test. Many protocols don't.

Architectural decisions explained

1. Pydantic models over list[dict]

Returning list[dict] from a tool produces a JSON Schema with "additionalProperties": true — vague, unhelpful for the LLM. Returning list[NoteMatch] (a Pydantic BaseModel) produces a precise schema with $ref and $defs. The same data, dramatically different machine-readability.

The general rule: type precision in your language produces schema precision on the wire. This applies across FastAPI, Pydantic, Instructor, and every modern Python framework that auto-generates schemas.

2. URI templates over enumeration for resources

@mcp.resource("notes://{filename}") handles any markdown file with one definition. Enumerating each file individually would scale poorly and require server changes when new files are added.

The trade-off: templates are cheap and flexible; enumeration is discoverable. This repo includes both — a template for read access, plus a list_my_notes tool for discovery. Production-grade APIs typically expose both patterns.

3. Tools alongside resources for LLM-driven workflows

The notes://{filename} resource works perfectly when a user attaches it to a conversation. But LLMs don't autonomously read resources in Claude Desktop — resources are user-attached, not LLM-invoked. Adding read_note(filename) as a tool with identical logic makes the same capability available to autonomous agents.

The general rule: the primitive you choose determines who drives the interaction. If you need the LLM to access something on its own, expose it as a tool. If users will attach it manually, a resource is sufficient.

4. 127.0.0.1 over 0.0.0.0 for development binding

FastMCP defaults to binding the HTTP transport to 127.0.0.1:8000 — localhost only. Binding to 0.0.0.0 would expose it to every interface the machine is connected to — Wi-Fi networks, Docker bridges, VPNs. With no authentication, this would be a data exposure. Default to the smallest network scope that works.

5. uv over pip for dependency management

uv (Astral, Rust-based) installs MCP and its transitive dependencies in seconds versus minutes for pip. It also resolves the kind of environment-corruption issues that plague older pip versions. For new Python projects in 2026, uv is the production-grade default.

Production findings

Finding 1: Schema vagueness propagates to agent behavior

When search_my_notes returned list[dict], the auto-generated schema had "additionalProperties": true. Claude couldn't predict the shape of the response, which led to less efficient tool usage. Switching to a NoteMatch Pydantic model produced a precise schema with named string fields. Schema clarity is silent prompt engineering.

Finding 2: Resource vs Tool changes who can invoke

In Claude Desktop, asking "what notes do I have?" did not autonomously trigger the notes://{filename} resource — resources are user-attached, not LLM-invoked. Adding a read_note(filename) tool with the same underlying logic gave Claude the access pattern it needed. Same capability, two primitives, different authority models.

Finding 3: Tool granularity shapes agent efficiency

Initially, asking Claude about Day 4 bugs caused it to make 4-5 search calls with different keywords, stitching snippets together. After adding read_note(filename) as a tool, the same query became a single read. Agent inefficiency is often a symptom of missing tools, not slow models.

Finding 4: Transport is independent of protocol

The exact same server, with one line changed (transport="streamable-http"), responds correctly over HTTP to a hand-crafted curl request. Same initialize handshake. Same serverInfo. Same capabilities. Different envelope (Server-Sent Events vs raw stdio). Good protocols make this swap one line of code.

Finding 5: Enterprise governance lockdowns are a production reality

Custom MCP server registration is locked in an enterprise-managed Claude Desktop installation. Edits to claude_desktop_config.json are overwritten on each restart because the file is output state, not input config — the source of truth is the central admin's connector allowlist.

This is the correct security default for an enterprise AI deployment. A malicious custom MCP server has near-root access to the LLM's decision-making context. Locking down arbitrary additions prevents that. Personal Anthropic accounts have no such restriction. Architects designing AI tooling for enterprise must plan for both modes — open extensibility for developers, locked allowlists for general users.

MCP vs alternatives

| Approach | What it is | When it's right | |---|---|---| | Raw function calling | Each LLM defines its own tool format | Single-vendor integrations, prototypes | | OpenAPI / REST | Stateless HTTP APIs with JSON Schema specs | Human-driven systems, traditional service integration | | A2A (Google, April 2025) | Agent-to-agent communication protocol | Multi-agent collaboration, agent orchestration | | MCP (Anthropic, Nov 2024) | Agent-to-tool protocol with stateful sessions | LLM tool access, especially across vendors |

MCP and A2A are complementary, not competing. MCP defines how agents talk to tools. A2A defines how agents talk to other agents. Together they form the two halves of the agentic protocol stack. OpenAPI sits below both as the general API description language; raw function calling exists for cases where a protocol is overkill.

The architect's question: do I need a vendor-agnostic, stateful, capability-negotiating integration, or just one-off API calls? MCP wins the first; REST wins the second; raw function calling wins simple prototypes.

Quickstart

# Install uv if you do not have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and set up
git clone https://github.com/satyesh17/custom-mcp-server
cd custom-mcp-server
uv venv
source .venv/bin/activate
uv add "mcp[cli]"

# Run with the MCP Inspector for interactive testing
mcp dev server.py

# Or run as an HTTP service
python server.py
# Server listens on http://127.0.0.1:8000/mcp

Tech stack

Python 3.13 with type hints throughout
MCP Python SDK 1.27.1 — official implementation of the protocol
FastMCP — high-level Python framework for MCP servers
Pydantic v2 — Rust-based runtime validation, used by the Groq SDK, OpenAI SDK, FastAPI, and most modern Python AI tooling
Uvicorn — ASGI server used by FastMCP's Streamable HTTP transport; same server that powers FastAPI in production
uv by Astral — Rust-based pip replacement, the production-grade default for new Python projects in 2026
JSON-RPC 2.0 — the wire protocol, originally specified in 2010 and used by Ethereum, VS Code's LSP, and Bitcoin Core
Server-Sent Events (SSE) — the streaming transport mechanism inside Streamable HTTP

What I learned

1. Protocols win industries by removing integration cost. MCP's adoption is not about features — it is about the math. N×M to N+M. Every successful protocol in computing history made the same reduction.

2. Schema precision and tool design shape agent behavior more than model selection. Switching to Pydantic-typed responses and adding read_note as a tool changed how Claude behaved against the same data. Same model, different surface area, different agent.

3. Enterprise governance is a first-class architecture concern. A custom MCP server that works perfectly on a personal laptop may be locked out of an enterprise client by design. Architects designing AI integrations must plan for both modes from the start.

Related projects

llm-comparator — Day 2's multi-provider LLM benchmark with schema-pass-rate measurement
email-classifier — Day 3's email classification with 5 prompting patterns and Pydantic-validated outputs
tool-use-agent — Day 4's pure-Python and LangGraph tool-use agents with 4 documented production bugs

Built as part of ai-architect-journey — a public learning log toward becoming an AI Solutions Architect. '''

with open('README.md', 'w') as f: f.write(content)

print("README.md written successfully") print(f"Length: {len(content)} characters")

MCP Servers