G
Gemini Multimodal MCP
MCP server by marcius-llmus
Created 3/6/2026
Updated about 6 hours ago
README
Repository documentation and setup instructions
multimodal-reader-mcp
MCP server for reading local audio and video files with Google Gen AI and returning structured observations, timelines, and transcripts.
It analyzes a local media file and returns:
- a short summary
- a timeline of key moments
- transcript snippets for spoken or visible text
- key observations and notable signals
- relevant clues tailored to the user's question
- open questions plus a confidence level
Requirements
uv- Python
3.14 GOOGLE_API_KEY
Model configuration
The default model is gemini-2.5-flash.
You can override the default model for all requests by setting:
MULTIMODAL_READER_MODEL
MCP client configuration
Example Cursor MCP config:
{
"mcpServers": {
"multimodal-reader": {
"command": "uvx",
"args": ["multimodal-reader-mcp"],
"env": {
"GOOGLE_API_KEY": "${env:GOOGLE_API_KEY}",
"MULTIMODAL_READER_MODEL": "gemini-2.5-flash"
}
}
}
}
Tool
The package exposes one MCP tool:
read_media(file_path, question=None)
file_path must be an absolute path to a local media file.
Quick Setup
Installation guide for this server
Install Package (if required)
uvx gemini-multimodal-mcp
Cursor configuration (mcp.json)
{
"mcpServers": {
"marcius-llmus-gemini-multimodal-mcp": {
"command": "uvx",
"args": [
"gemini-multimodal-mcp"
]
}
}
}