MCP server by dm17ryk
Codex Windows GUI MCP
This repository provides a session-first Windows GUI automation harness for Codex and other MCP clients. It is built around four pieces:
- a thin MCP server in server.py;
- a modular harness package in win_gui_core;
- Qt and klogg adapter helpers in adapters;
- a local Computer Use loop in openai_loop.py and loops/computer_use.py.
1. Install prerequisites
winget install Codex -s msstore
winget install --id Git.Git
winget install --id Python.Python.3.11
winget install --id Microsoft.DotNet.SDK.10
Optional for WebDriver-style Windows UI tests:
.\scripts\start-winappdriver.ps1
2. Create and populate the virtual environment
py -3.11 -m venv .venv
.\.venv\Scripts\python.exe -m pip install --upgrade pip
.\.venv\Scripts\python.exe -m pip install -r requirements.txt
3. Configure the target app
Use .env.example as the baseline. In practice these are the main settings:
APP_EXEAPP_WORKDIRAPP_ARGSAPP_LOG_DIRAPP_STATE_DIRAPP_DUMP_DIRMAIN_WINDOW_TITLE_REGEXOPENAI_API_KEYOPENAI_COMPUTER_MODELAPP_STATE_DUMP_ARGQT_AUTOMATION_ENV_VAR
Model guidance:
gpt-5.4is a good default.computer-use-previewis the specialized option when the workflow is heavily vision-driven.
4. Wire the MCP server into Codex
Merge user_config.toml.example into %USERPROFILE%\.codex\config.toml or into a project-local .codex\config.toml.
Adjust:
- the Python interpreter path;
- the server.py path;
- the app-specific environment variables.
5. Session-first workflow
The preferred flow is:
launch_appor an external app-start action.create_session.wait_window_stable.find_element/click_element/ adapter tools first.capture_screenshotandclick_point/drag_pathonly when semantics are missing.assert_*helpers after every meaningful step.create_artifact_bundlewhen you need a full debugging package.
6. Coordinate-space rules
full_screenscreenshots produce desktop-space coordinates.windowandregionscreenshots produce viewport-relative coordinates.click_point,double_click_point,drag_path, andscroll_attranslate viewport coordinates through the active session viewport before callingpyautogui.
This matters because screenshot-space and desktop-space are not interchangeable when the target window is not at (0, 0).
7. Main MCP tools
The current MCP surface is session-first:
create_sessionget_sessionrefresh_sessionclose_sessionenumerate_monitorsrestore_windowwait_window_stablecapture_screenshotcapture_regionclick_pointdouble_click_pointdrag_pathscroll_attype_textsend_hotkeyfind_elementwait_for_elementwait_for_element_goneassert_elementclick_elementdouble_click_elementright_click_elementdrag_element_to_pointdrag_element_to_elementassert_window_titleassert_status_textassert_log_containswait_process_idlecreate_artifact_bundlelist_artifactscollect_event_logscollect_dumpsget_process_treedump_qt_statefind_qt_objectclick_qt_objectinvoke_qt_actionset_qt_valuetoggle_qt_controlklogg_open_logklogg_searchklogg_get_stateklogg_get_active_tabklogg_toggle_followklogg_get_visible_range
8. Qt and klogg instrumentation
This repo now assumes deep instrumentation is allowed for Qt targets.
Recommended target-app support:
- stable
objectNamevalues on important widgets andQActions; accessibleNameon icon-only or ambiguous controls;- optional
accessibleDescription; - an automation mode such as
KLOGG_AUTOMATION=1; - a state dump endpoint, with
--dump-state-json <path>used by default in this repo.
Useful klogg state fields:
activeFileactiveTabTitlecursorLinecursorColumnvisibleLineStartvisibleLineEndsearchTextmatchCountfollowModescratchPadencodingparserMode
9. Artifact and trace layout
Each session writes to artifacts/sessions/<session_id>/.
Typical contents:
trace.jsonlscreenshots/*.pngbundle-*/bundle-manifest.json- copied logs
- copied dumps
- UI tree snapshots
- optional Qt state dumps
10. Local environment actions
See LOCAL_ENVIRONMENT_SETUP.md.
11. Direct Computer Use loop
.\.venv\Scripts\python.exe .\openai_loop.py --window-title "MyApp" "Open the Settings window, switch to Advanced, and tell me whether the Save button becomes disabled."
Use --full-screen only when the task genuinely spans the desktop rather than one pinned target window.
12. Safety notes
- Run destructive scenarios in an isolated Windows VM or disposable user profile.
- Prefer semantic/UIA interactions over blind coordinate clicks.
- Review destructive steps with a human in the loop.
pyautoguifail-safe is enabled; moving the cursor to the top-left corner interrupts execution.