MCP Servers

A collection of Model Context Protocol servers, templates, tools and more.

G
Gemini Multimodal MCP

MCP server by marcius-llmus

Created 3/6/2026
Updated about 6 hours ago
Repository documentation and setup instructions

multimodal-reader-mcp

MCP server for reading local audio and video files with Google Gen AI and returning structured observations, timelines, and transcripts.

It analyzes a local media file and returns:

  • a short summary
  • a timeline of key moments
  • transcript snippets for spoken or visible text
  • key observations and notable signals
  • relevant clues tailored to the user's question
  • open questions plus a confidence level

Requirements

  • uv
  • Python 3.14
  • GOOGLE_API_KEY

Model configuration

The default model is gemini-2.5-flash.

You can override the default model for all requests by setting:

  • MULTIMODAL_READER_MODEL

MCP client configuration

Example Cursor MCP config:

{
  "mcpServers": {
    "multimodal-reader": {
      "command": "uvx",
      "args": ["multimodal-reader-mcp"],
      "env": {
        "GOOGLE_API_KEY": "${env:GOOGLE_API_KEY}",
        "MULTIMODAL_READER_MODEL": "gemini-2.5-flash"
      }
    }
  }
}

Tool

The package exposes one MCP tool:

  • read_media(file_path, question=None)

file_path must be an absolute path to a local media file.

Quick Setup
Installation guide for this server

Install Package (if required)

uvx gemini-multimodal-mcp

Cursor configuration (mcp.json)

{ "mcpServers": { "marcius-llmus-gemini-multimodal-mcp": { "command": "uvx", "args": [ "gemini-multimodal-mcp" ] } } }