MCP NL Ops — Proper MCP Architecture

Natural language questions about your Kind cluster, backed by the actual Model Context Protocol — not a keyword router or direct function calls.

How it actually works (MCP)

Browser  →  /api/ask  →  MCP Client (app.py)
                              |
                    1. list_tools()  ←──SSE/JSON-RPC──  MCP Server (mcp_server.py)
                    2. Ollama decides which tools to call
                    3. call_tool(name, args)  ──────────► MCP Server executes
                    4. tool result fed back to Ollama
                    5. Ollama streams final answer

The LLM (Ollama) owns tool selection — it sees the tool schemas and decides what to call based on the question. No keyword matching.

vs. the wrong way

| Dimension | Keyword router (wrong) | MCP (this repo) | |---|---|---| | Tool selection | if "pod" in question | Ollama reads schemas, reasons | | Tool execution | Direct Python call | JSON-RPC to MCP server | | Tool discovery | Hardcoded in app | list_tools() at runtime | | Extensible? | Edit app.py | Add @mcp.tool() to server only | | Claude Desktop? | No | Yes — point it at port 8001 |

Kind cluster setup (recommended)

Everything runs inside a local Kind cluster — no port-forwards needed, host ports are mapped in kind-config.yaml.

Prerequisites

# macOS
brew install kind kubectl docker

# Linux (Fedora/RHEL)
sudo dnf install -y kind kubectl
# Docker: https://docs.docker.com/engine/install/

One-command setup

./setup.sh          # creates cluster, builds image, deploys everything

After ~3–5 minutes (model pull on first run):

| Service | URL | |------------|------------------------------| | UI / API | http://localhost:8765 | | MCP server | http://localhost:8001/sse | | Prometheus | http://localhost:9090 |

Day-2 commands

./setup.sh status   # kubectl get pods -n nlops
./setup.sh build    # rebuild image and reload into Kind after code changes
./setup.sh deploy   # re-apply manifests only (no image rebuild)
./setup.sh down     # delete the entire cluster

Cluster layout

kind-config.yaml          ← 1 control-plane + 2 workers, host port mappings
k8s/
  namespace.yaml          ← nlops namespace
  mcp-server.yaml         ← Deployment + NodePort + RBAC (read-only K8s access)
  app.yaml                ← Deployment + NodePort (uvicorn, port 8765)
  prometheus.yaml         ← Deployment + NodePort + ConfigMap
  ollama.yaml             ← Deployment + initContainer that pulls llama3.1
  Dockerfile              ← single image used by mcp-server and nlops-app

The MCP server pod uses a ServiceAccount with a ClusterRole that grants read-only access to pods, deployments, events, and services — nothing else.

Local dev (without Kind)

Requirements

Kind cluster running with your MLOps drift demo
Prometheus accessible (port-forward or NodePort)
Ollama with a tool-calling model: llama3.1, mistral, or qwen2.5
```
ollama pull llama3.1   # recommended
```

Run

# Terminal 1 — MCP Server
python mcp_server.py
# → MCP server on http://localhost:8001

# Terminal 2 — App + UI
uvicorn app:app --reload --port 8765
# → open http://localhost:8765

Env vars

export PROMETHEUS_URL=http://localhost:9090   # port-forwarded
export OLLAMA_MODEL=llama3.1                  # must support tool calling
export K8S_NAMESPACE=default

Claude Desktop integration (bonus)

The MCP server is a standard FastMCP SSE server. Add it to ~/.config/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "k8s-mlops": {
      "url": "http://localhost:8001/sse"
    }
  }
}

Now Claude Desktop can also query your cluster using the same tools.

Tools registered on the MCP server

| Tool | What Ollama can call it for | |---|---| | get_cluster_summary | General health, pod counts, troubled pods | | list_pods | Pod-level detail, restarts, readiness | | list_deployments | Replica mismatches, rollout status | | get_recent_events | Warnings, OOM kills, scheduling failures | | get_pod_logs | Logs from a specific pod | | get_model_metrics | Drift score, accuracy, prediction rate | | get_resource_metrics | CPU, memory, restart totals | | get_redis_info | Stream depth, consumer lag | | get_anomalous_metrics | N-sigma scan across all metrics |

MCP Servers