mcp-csu

MCP server for the Czech Statistical Office (ČSÚ / CZSO) DataStat API. Gives AI assistants direct access to 700+ statistical datasets about the Czech Republic — population, economy, prices, wages, employment, industry, agriculture, trade, tourism, environment, and more.

Single Python file. No cloning required — just uvx mcp-csu.

Features

Full catalog access — search, browse, and inspect all 700+ datasets and 1500+ predefined tables
Data retrieval — fetch statistical data as CSV, query individual values with full context
AI-optimized output — human-readable text for metadata, CSV for data, automatic truncation with row counts
Rate limiting — built-in concurrency control (3 parallel requests) and minimum request interval (150ms)
Caching — catalog listings cached in memory for 10 minutes to avoid redundant requests
No authentication — the DataStat API is public

Prerequisites

uv (Python package runner)

That's it. Python and all dependencies are managed automatically by uv.

Configuration

Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "csu": {
      "command": "uvx",
      "args": ["mcp-csu"]
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "csu": {
      "command": "uvx",
      "args": ["mcp-csu"]
    }
  }
}

Any MCP client

The server uses stdio transport (default). Launch command:

uvx mcp-csu

Data model

The DataStat database has a hierarchical structure:

Dataset (sada)              e.g. CEN0101H — "Míra inflace"
├── Dimensions (dimenze)    e.g. CasR (years), Uz0 (territory)
│   └── Items (položky)     e.g. "2024", "CZ"
├── Indicators (ukazatele)  e.g. 6134J06 — "Průměrná roční míra inflace"
└── Selections (výběry)     e.g. CEN0101HT01 — "Průměrná roční míra inflace"
    └── CSV data            pre-configured table ready to fetch

Datasets contain raw multidimensional data. Each dataset has dimensions (time, territory, categories) and indicators (what is measured).

Selections are predefined views — a specific slice of a dataset with fixed dimension arrangement. They are the easiest way to get data.

Tools

Discovery

`search_datasets`

Full-text search across all datasets. Returns dataset codes, names, time period types, and territory levels.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | query | string | yes | Search keyword (Czech recommended) |

search_datasets("inflace")
→ Found 3 dataset(s):
    WCEN01 (v4) — Index spotřebitelských cen (indexy, míra inflace)
    WCEN01M (v1) — Index spotřebitelských cen — měsíční data
    CEN0101H (v1) — Míra inflace

`search_selections`

Full-text search across all predefined data tables.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | query | string | yes | Search keyword (Czech recommended) |

search_selections("mzdy")
→ Found 30 selection(s):
    MZDQ1T1 — Průměrný evidenční počet zaměstnanců a průměrné hrubé měsíční mzdy...
      Period: Čtvrtletí | Territory: Stát | Dataset: MZDQ1

`list_datasets`

Paginated listing of all datasets.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | offset | int | 0 | Skip first N items | | limit | int | 30 | Items per page (max 100) |

`list_selections`

Paginated listing of all predefined tables.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | offset | int | 0 | Skip first N items | | limit | int | 30 | Items per page (max 100) |

Exploration

`get_dataset`

Full dataset metadata: dimensions with item counts, indicators with definitions, keywords, update frequency.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | dataset_code | string | yes | Dataset code (e.g. CEN0101H) |

get_dataset("CEN0101H")
→ Dataset: CEN0101H (v1)
  Name: Míra inflace
  Keywords: míra inflace
  Update frequency: MONTHLY

  Dimensions (4):
    CasM — Měsíce (720 items)
    CasR — Roky (61 items)
    CASRMX — Měsíce, roky (780 items)
    Uz0 — Území (1 items)

  Indicators (4):
    6134J09 — Přírůstek průměrného ročního indexu spotřebitelských cen - měsíční
    6134J06 — Průměrná roční míra inflace
    ...

`get_dataset_selections`

List predefined data tables for a specific dataset.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | dataset_code | string | yes | Dataset code |

get_dataset_selections("CEN0101H")
→ Selections for CEN0101H (2):
    CEN0101HT01 — Průměrná roční míra inflace
      Period: Rok | Territory: Stát
    CEN0101HT02 — Míra inflace - měsíční
      Period: Měsíc | Territory: Stát

`get_dimension_items`

Get all possible values for a dimension. Supports hierarchy level filtering and pagination.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | dimension_code | string | — | Dimension code from get_dataset() | | level | string | null | Filter by hierarchy level (e.g. STAT, KRAJ, OKRES) | | offset | int | 0 | Skip first N items | | limit | int | 50 | Items per page (max 200) |

get_dimension_items("UZ023H2U", level="KRAJ")
→ Dimension UZ023H2U — 14 item(s) at level KRAJ:
    CZ010 — Hlavní město Praha (Capital City Prague) [KRAJ]
    CZ020 — Středočeský kraj (Central Bohemian Region) [KRAJ]
    CZ031 — Jihočeský kraj (South Bohemian Region) [KRAJ]
    ...

`get_indicator`

Indicator definition and display format.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | indicator_code | string | yes | Indicator code from get_dataset() |

Data retrieval

`get_selection_data`

Primary data access tool. Fetches CSV data from a predefined selection.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | selection_code | string | — | Selection code (e.g. CEN0101HT01) | | max_rows | int | 100 | Max data rows. 0 = unlimited |

get_selection_data("CEN0101HT01", max_rows=5)
→ "Ukazatel","Území","Roky","Hodnota"
  "Průměrná roční míra inflace","Česko","2025","2.5"
  "Průměrná roční míra inflace","Česko","2024","2.4"
  "Průměrná roční míra inflace","Česko","2023","10.7"
  "Průměrná roční míra inflace","Česko","2022","15.1"
  "Průměrná roční míra inflace","Česko","2021","3.8"

  [Showing 5 of 29 rows. Use max_rows=0 for all data or max_rows=10 to see more.]

`get_value`

Retrieve a single specific value. The most precise query — returns one number with full context (indicator name, dimension labels, publication date).

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | dataset_code | string | — | Dataset code | | indicator_code | string | — | Indicator code | | dimension_codes | list[str] | — | Dimension codes in order | | item_codes | list[str] | — | Item codes matching dimensions | | version | string | null | Dataset version (latest if omitted) |

get_value("RSO01", "3971b",
          ["CasR", "TYPPROSJED", "UZ023H2U"],
          ["2023", "501", "CZ"])
→ Value: 6 258
  Indicator: Počet územních jednotek
    Roky: 2023
    Typ prostorové jednotky: Obec
    ČR, kraje, okresy: Česko
  Published: 2024-04-30T07:00:00Z

`custom_query`

Execute an arbitrary data query on a dataset. Returns CSV.

This is an advanced tool — prefer get_selection_data() when a suitable predefined table exists. The custom query API is sensitive to correct dimension placement and hierarchy level filtering.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | dataset_code | string | — | Dataset code | | dataset_version | string | — | Version from get_dataset() | | columns | list[dict] | — | Column dimensions (each needs kodDimenze) | | rows | list[dict] | — | Row dimensions | | table_filters | list[dict] | null | Filter dimensions | | max_rows | int | 100 | Max CSV rows |

`get_dataset_metadata`

Dataset content statistics: record count, time range, publication and update timestamps.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | dataset_code | string | yes | Dataset code | | version | string | yes | Version from get_dataset() |

Usage examples

Get Czech inflation rate

1. search_datasets("inflace")
   → CEN0101H — Míra inflace

2. get_dataset_selections("CEN0101H")
   → CEN0101HT01 — Průměrná roční míra inflace

3. get_selection_data("CEN0101HT01")
   → CSV with annual inflation rates from 1994 to present

Find average wages by region

1. search_selections("mzdy kraje")
   → MZDQ1T2 — ... dle krajů a regionů soudržnosti

2. get_selection_data("MZDQ1T2", max_rows=20)
   → CSV with wages by region

Get exact population of Prague in 2023

1. search_datasets("obyvatelstvo")
   → OBY01 — Obyvatelstvo podle pohlaví a věku

2. get_dataset("OBY01")
   → see dimensions and indicators

3. get_dimension_items("<territory_dim>", level="KRAJ")
   → find Prague code

4. get_value("OBY01", "<indicator>",
             ["<time_dim>", "<territory_dim>"],
             ["2023", "<prague_code>"])
   → exact value

Czech vocabulary for search

The database is in Czech. Common search terms:

| Czech | English | Example datasets | |-------|---------|-----------------| | obyvatelstvo | population | OBY01, OBY02 | | mzdy | wages | MZDQ1, MZD01 | | ceny | prices | CEN01, CEN02 | | inflace | inflation | CEN0101H | | HDP | GDP | NUC06R, NUC06Q | | nezaměstnanost | unemployment | ZAM04 | | průmysl | industry | PRU01 | | stavebnictví | construction | STA01 | | vzdělání | education | VZD01 | | zdraví | health | ZDR01 | | zemědělství | agriculture | ZEM01 | | doprava | transport | DOP01 | | cestovní ruch | tourism | CRU01 | | životní prostředí | environment | ZPR01 | | kriminalita | crime | KRI01 | | volby | elections | VOL01 | | bytová výstavba | housing | BYT01 | | zahraniční obchod | foreign trade | VZO01 |

Technical details

Architecture

Single-file Python server using FastMCP framework over stdio transport. Dependencies managed via PEP 723 inline script metadata — uv run installs them automatically into an isolated environment.

Upstream API

The server wraps two DataStat REST APIs:

| API | Base URL | Purpose | |-----|----------|---------| | Catalog | https://data.csu.gov.cz/api/katalog/v1 | Dataset/selection/dimension/indicator metadata | | Data | https://data.csu.gov.cz/api/dotaz/v1 | Data retrieval (CSV, JSON-STAT) |

API documentation:

Rate limiting

The DataStat API does not document rate limits, but the server applies conservative throttling:

Max concurrent requests: 3 (semaphore)
Min request interval: 150ms (global)
Request timeout: 60 seconds

Caching

Catalog listings (list_datasets, list_selections) are cached in memory with a 10-minute TTL. These endpoints return the full catalog (700–1500 items) on every call since the API ignores pagination parameters — caching avoids repeated large transfers.

Output formatting

Metadata tools return structured text with clear labels
Data tools return CSV (most compact and LLM-friendly tabular format)
Truncation: data responses are limited to 100 rows by default, with total count shown. Adjustable via max_rows parameter
Language: all API responses are in Czech (Accept-Language: cs)

Dependencies

| Package | Version | Purpose | |---------|---------|---------| | mcp | >=1.0.0 | MCP server framework (FastMCP) | | httpx | >=0.27.0 | Async HTTP client |

Both installed automatically by uv run.

License

MIT

MCP Servers