Orthanc DICOM Query & PDF Extraction API

This project provides a modular FastMCP server for querying an Orthanc DICOM server and extracting text from encapsulated PDF reports.

It exposes a clean, structured set of tools for navigating the DICOM hierarchy:

Patients → Studies → Series → Instances → PDF Extraction

✨ Features

🔍 Query Patients by name, ID, or birth date
📂 Query Studies for a patient (CT, MRI, X-ray, etc.)
🌀 Query Series within a study (phases, sequences, reconstructions)
📑 Query Instances within a series (individual DICOM objects)
📖 Extract PDF Text from encapsulated DICOM reports

All tools are exposed via FastMCP and can be consumed programmatically or interactively.

📂 Project Structure

orthanc_mcp/
├── main.py                # Entry point: starts FastMCP server
├── config.py              # Environment variables & constants
├── orthanc_client.py      # Helper functions for Orthanc REST API
├── tools/
│   ├── __init__.py        # Makes tools importable
│   ├── patients.py        # query_patients tool
│   ├── studies.py         # query_studies tool
│   ├── series.py          # query_series tool
│   ├── instances.py      # query_instances tool
│   └── pdf_extract.py    # extract_pdf_text_from_dicom tool
└── requirements.txt       # Python dependencies

⚙️ Installation

Clone the repository

git clone https://github.com/yourusername/orthanc-mcp.git
cd orthanc-mcp

Install dependencies

python -m venv venv
venv\Scripts\Activate
pip install -r requirements.txt

Configure environment variables

Create a .env file:

ORTHANC_URL=http://localhost:8042
MCP_HOST=localhost
MCP_PORT=5050

🚀 Usage

Run the server:

python main.py

The FastMCP server will start and expose the following tools:

query_patients
query_studies
query_series
query_instances
extract_pdf_text_from_dicom

🔄 Workflow

The tools must be called in top-down order, following the DICOM hierarchy:

Patients → query_patients
Studies → query_studies(patient_id=...)
Series → query_series(study_id=...)
Instances → query_instances(series_id=...)
PDF Extraction → extract_pdf_text_from_dicom(instance_id=...)

📊 Sequence Flow Diagram

flowchart TD
    A[query_patients] --> B[query_studies]
    B --> C[query_series]
    C --> D[query_instances]
    D --> E[extract_pdf_text_from_dicom]

This diagram shows the hierarchical navigation required to reach and extract PDF reports.

📖 Example Use Case

from mcp.client import MCPClient

client = MCPClient("http://localhost:5050")

# Step 1: Find patient
patients = client.call("query_patients", {"name": "Smith"})
patient_id = patients[0]["PatientID"]

# Step 2: Find studies
studies = client.call("query_studies", {"patient_id": patient_id})
study_id = studies[0]["OrthancStudyID"]

# Step 3: Find series
series = client.call("query_series", {"study_id": study_id})
series_id = series[0]["OrthancSeriesID"]

# Step 4: Find instances
instances = client.call("query_instances", {"series_id": series_id})
instance_id = instances[0]["OrthancInstanceID"]

# Step 5: Extract PDF text
pdf_text = client.call(
    "extract_pdf_text_from_dicom",
    {"instance_id": instance_id}
)

print(pdf_text)

🛠️ Dependencies

FastMCP
requests
pydicom
PyPDF2
python-dotenv

📌 Notes

Orthanc must be running and accessible at the configured ORTHANC_URL.
PDF extraction works only for Encapsulated PDF SOP Class: 1.2.840.10008.5.1.4.1.1.104.1
If a PDF has no text layer (e.g., scanned image), extraction will return a warning or empty result.

🤝 Contributing

Pull requests are welcome!

For major changes, please open an issue first to discuss what you’d like to change.

👥 Contributors

Thanks to the following people who have contributed to this project:

MCP Servers