MCP Servers

A collection of Model Context Protocol servers, templates, tools and more.

System Perception MCP

MCP server by haji-mi

Created 4/1/2026
Updated about 7 hours ago
Repository documentation and setup instructions

System Perception MCP Server

English | 简体中文

A high-performance Model Context Protocol (MCP) server designed for AI Agents to perceive and control the Windows operating system with ultra-low latency and zero physical mouse/keyboard interference.

🌟 Core Features

  • Ultra-Low Latency Screen Perception: Bypasses traditional slow screenshot methods. Utilizes dxcam for direct DXGI VRAM capture and OpenCV for memory compression, delivering screen frames to the agent in under ~120ms.
  • Silent Background Control: Eliminates the fragile and disruptive nature of physical mouse/keyboard simulation. Uses win32api and uiautomation to send underlying system messages (PostMessage) and invoke UI elements silently.
  • UI Tree Parsing (get_ui_tree): Instantly reads the accessibility tree of standard Windows applications, bypassing the slow Vision-Language Model (VLM) coordinate calculation bottleneck.
  • Instant Execution (invoke_ui_element): Directly triggers standard OS elements (like desktop icons, buttons, and text fields) in less than a second based on UI definitions rather than screen coordinates.

🛠️ Requirements

  • OS: Windows 10 / 11 (Requires DXGI and Windows UIAutomation APIs)
  • Python: 3.8+
  • Agent Harness: Any MCP-compatible client (e.g., Claude Desktop, DeerFlow)

📦 Installation

  1. Clone this repository:

    git clone <YOUR_GITHUB_REPO_URL>
    cd system-perception-mcp
    
  2. Install the required dependencies:

    pip install -r requirements.txt
    

🚀 Exposed Tools

Once connected to an MCP client, the following tools become available to the LLM/Agent:

  • get_gpu_frame(): Instantly captures the current screen from the GPU frame buffer.
  • get_ui_tree(): Scans and returns the current window's hierarchical UI structure.
  • invoke_ui_element(element_name/id): Directly interacts with a specific UI node without moving the physical cursor.
  • silent_mouse_click(x, y, hwnd): Sends a background click event to specific coordinates within a target window.
  • silent_keyboard_type(text, hwnd): Injects keystrokes directly into a background application's message queue.

💡 Why This Approach?

Traditional visual AI agents rely on taking screenshots, sending them to a VLM, waiting 2-4 seconds for coordinate calculation, and then physically moving the user's cursor. This is slow, fragile, and prevents the user from using their computer while the agent is working.

System Perception MCP solves this by fusing computer vision with native OS UI Automation, allowing the agent to "see" instantly and "act" invisibly.

📝 License

MIT License

Quick Setup
Installation guide for this server

Install Package (if required)

uvx -system-perception-mcp-

Cursor configuration (mcp.json)

{ "mcpServers": { "haji-mi-system-perception-mcp": { "command": "uvx", "args": [ "-system-perception-mcp-" ] } } }