Local stdio MCP server for reading PDFs with Apple PDFKit, Vision OCR, and LLM-friendly batching.
macpdf-ocr-mcp
macpdf-ocr-mcp is a macOS-native, PDFKit-first stdio MCP server with Apple Vision OCR fallback for scanned or image-heavy PDF pages.
It is designed for local LLM workflows: extract useful text first, add compact images only when they help, and keep large documents batched.
The project is tested with Codex, but it is not Codex-specific. It should work with MCP clients that support local stdio servers, including Claude Desktop, Claude Code, and Gemini CLI. Client-specific setup is intentionally left to each client or AI assistant.
📊 Light Benchmark
Light local A/B test using Codex CLI with gpt-5.5: across three PDF types, MCP preprocessing cut output/reasoning tokens by roughly half or more in all groups, reduced runtime by about one-third to one-half, and cut input tokens by over half on the image-heavy PDF. Mixed-PDF input was approximately unchanged.
| PDF type | Input tokens | Output tokens | Reasoning tokens | Runtime | |---|---:|---:|---:|---:| | Image-heavy | -55% | -73% | -92% | -34% | | Mixed | ~ | -58% | -87% | -53% | | Text-heavy | -24% | -47% | -63% | -56% |
~ means approximately unchanged. Results depend on PDF structure, prompt shape, and whether image previews are included in tool output.
✅ Requirements
- macOS 13 or newer
- Xcode or Apple Command Line Tools with Swift 6.3 or newer
- An MCP client that supports local stdio servers
The implementation uses Apple-native frameworks only: PDFKit, Vision, CoreGraphics, and AppKit.
🛠️ Build
From the project root:
swift build -c release
The release executable is created at:
.build/release/macpdf-ocr-mcp
🚀 Installation
Codex
Codex is the currently tested client.
codex mcp add macpdf-ocr-mcp -- "$(pwd)/.build/release/macpdf-ocr-mcp" mcp-stdio
Restart Codex or open a new Codex session after registration so the MCP server list is reloaded.
Others
Other MCP clients can use the same executable with stdio transport. Point the client at:
<project-root>/.build/release/macpdf-ocr-mcp mcp-stdio
This should work with clients such as Claude Desktop, Claude Code, and Gemini CLI when they are configured for local stdio MCP servers.
⚙️ Runtime Behavior
PDFKitis preferred when a PDF has a usable text layer.VisionOCR is used for scanned pages or when OCR is explicitly requested.hybridmode uses OCR only when it materially improves missing text extraction.- Large reads should use batching instead of returning a full document in one response.
- Region boxes use normalized top-left coordinates:
[left, top, width, height].
🔧 MCP Tools
-
pdf_read-
First-pass PDF reading. It returns page-grouped text, optional preview image paths, coarse regions, and continuation metadata for batched reads.
-
Arguments and MCP examples
Required:
file: PDF pathpages:all, a single page such as12, or a range such as12-20
Optional:
modebalanced(default): PDFKit/OCR page text + compressed page image (scale=5,1200px)text_only: PDFKit/OCR page text + text regionsimage_only: compressed page image only (scale=5,1200px)text_focus: PDFKit/OCR page text + smaller page image (scale=5,1080px)image_focus: PDFKit/OCR page text + larger page image (scale=5,1380px)
engineauto(default): PDFKit if text exists, otherwise Vision OCRpdfkit: PDFKit text extraction onlyocr: Vision OCR onlyhybrid: likeauto; reserved for stricter PDFKit+OCR merging
batch_size:4by defaultimage_scale:5by default;1-2=512px,3-4=768px,5-6=1200px,7-8=1600px,9-10=2200px
{ "file": "/path/to/document.pdf", "pages": "1-3" } { "file": "/path/to/document.pdf", "pages": "1-10", "mode": "balanced", "engine": "auto", "batch_size": 5, "image_scale": 6 }
-
-
pdf_focus-
Second-pass detail reading for one normalized page region. It returns local text, an optional cropped image path, and local region hints.
-
Arguments and MCP examples
Required:
file: PDF pathpage: 1-based page numberbbox_norm: normalized top-left-origin box[left, top, width, height]
Optional:
modebalanced(default): PDFKit/OCR region text + compressed region image (scale=7,1600px)text_only: PDFKit/OCR region text onlyimage_only: compressed region image only (scale=7,1600px)text_focus: PDFKit/OCR region text + compressed region image (scale=7,1600px)image_focus: PDFKit/OCR region text + larger region image (scale=7,2000px)
engineauto(default): PDFKit if region text exists, otherwise Vision OCRpdfkit: PDFKit text extraction onlyocr: Vision OCR onlyhybrid: likeauto; reserved for stricter PDFKit+OCR merging
image_scale:7by default;1-6=1200px,7-8=1600px,9-10=2200px
{ "file": "/path/to/document.pdf", "page": 4, "bbox_norm": [0.10, 0.20, 0.70, 0.25] } { "file": "/path/to/document.pdf", "page": 4, "bbox_norm": [0.10, 0.20, 0.70, 0.25], "mode": "balanced", "engine": "auto", "image_scale": 7 }
-
-
save_region-
Saves a selected region from a PDF page or image to a local file.
-
Arguments and MCP examples
Required:
source_type:pdforimagesource_path: source PDF or image pathoutput_path: destination image pathbbox_norm: normalized top-left-origin box[left, top, width, height]
Additional PDF arguments:
page: required whensource_type=pdfshort_side_px:1600pxby default
{ "source_type": "pdf", "source_path": "/path/to/document.pdf", "page": 4, "short_side_px": 1600, "output_path": "/tmp/region.png", "bbox_norm": [0.10, 0.20, 0.70, 0.25] } { "source_type": "image", "source_path": "/path/to/image.png", "output_path": "/tmp/region.png", "bbox_norm": [0.10, 0.20, 0.70, 0.25] }
-
-
ocr_detect_regions-
Runs Vision OCR on an image and returns OCR lines, normalized boxes, and grouped candidate regions.
-
Arguments and MCP examples
Required:
image: image path
Optional:
bbox_norm: OCR only this normalized image region
{ "image": "/path/to/image.png" } { "image": "/path/to/image.png", "bbox_norm": [0.10, 0.20, 0.70, 0.25] }
-
Generated preview and focus images are written under .tmp/runtime/. That directory is a local runtime artifact and should not be committed.
Local CLI debugging
The executable can also be called directly for local checks. MCP clients normally call tools through the protocol, not through these shell commands.
.build/release/macpdf-ocr-mcp pdf-read /path/to/document.pdf 1-3
.build/release/macpdf-ocr-mcp pdf-focus /path/to/document.pdf 4 0.10 0.20 0.70 0.25 balanced auto 7
.build/release/macpdf-ocr-mcp save-region pdf /path/to/document.pdf 4 1600 /tmp/region.png 0.10 0.20 0.70 0.25
.build/release/macpdf-ocr-mcp ocr-detect-regions /path/to/image.png
📦 Distribution
The simplest distribution path is source-first:
git clone <repo-url>
cd macpdf-ocr-mcp
swift build -c release
codex mcp add macpdf-ocr-mcp -- "$(pwd)/.build/release/macpdf-ocr-mcp" mcp-stdio
Prebuilt GitHub Release binaries can be added later once the MCP interface is stable.
🔗 Acknowledgements & Resources
This project is built with Apple-native frameworks and integrates with MCP-compatible clients.
| Project / Service | Category | Link & Purpose |
| :--- | :--- | :--- |
| Swift | Language | |
| PDFKit | Apple Framework |
|
| VisionKit | Apple Framework |
|
| Model Context Protocol | Protocol |
|
| Codex | MCP Client |
|