MCP server for image generation using z.ai image generation models.
Z.AI Image & Video Generation MCP Server
A Model Context Protocol (MCP) server that provides access to Z.AI's image and video generation models for LLM applications.
Features
- Image Generation: GLM-Image and CogView-4 models for high-quality image generation
- Video Generation: CogVideoX-3, Vidu Q1, and Vidu 2 models for AI video creation
- Multiple Input Modes: Text-to-image/video, image-to-video, start-end frame animation
- Asynchronous Processing: Submit long-running tasks and poll for results
- Automatic Downloads: Generate and download in a single operation
- Automatic Retries: Built-in retry logic with exponential backoff
- Comprehensive Validation: Input validation with clear error messages
- Type-Safe: Full TypeScript support with detailed type definitions
Installation
npm install GeorgH93/z_ai_image_gen_mcp
Configuration
Set your Z.AI API key as an environment variable:
export ZAI_API_KEY=your_api_key_here
Get your API key from the Z.AI API Keys page or sign up for the GLM Coding Plan.
Optional Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| ZAI_API_BASE_URL | API base URL | https://api.z.ai/api |
| ZAI_DEFAULT_MODEL | Default model | glm-image |
| ZAI_DEFAULT_SIZE | Default image size | 1280x1280 |
| ZAI_REQUEST_TIMEOUT | Request timeout (ms) | 60000 |
| ZAI_MAX_RETRIES | Max retry attempts | 3 |
| ZAI_RETRY_DELAY | Initial retry delay (ms) | 1000 |
Usage
With Claude Desktop
Add to your Claude Desktop configuration (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"z-ai-image": {
"command": "npx",
"args": ["z-ai-image-mcp"],
"env": {
"ZAI_API_KEY": "your_api_key_here"
}
}
}
}
With Other MCP Clients
Run the server directly:
npx z-ai-image-mcp
Or programmatically:
import { createServer, loadConfig } from 'z-ai-image-mcp';
const config = loadConfig();
const server = createServer(config);
// Connect to your transport...
With OpenCode
Add to your OpenCode configuration (opencode.json or opencode.jsonc in your project root):
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"z-ai-image": {
"type": "local",
"command": ["npx", "z-ai-image-mcp"],
"enabled": true,
"environment": {
"ZAI_API_KEY": "your_api_key_here"
}
}
}
}
Or using an environment variable reference:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"z-ai-image": {
"type": "local",
"command": ["npx", "z-ai-image-mcp"],
"enabled": true,
"environment": {
"ZAI_API_KEY": "{env:ZAI_API_KEY}"
}
}
}
}
Using with OpenCode prompts:
Generate a professional logo for a tech startup. use z-ai-image
Or add to your AGENTS.md:
When generating images, use the `z-ai-image` MCP server tools.
Per-agent configuration (optional):
To enable the MCP server only for specific agents:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"z-ai-image": {
"type": "local",
"command": ["npx", "z-ai-image-mcp"],
"enabled": true,
"environment": {
"ZAI_API_KEY": "{env:ZAI_API_KEY}"
}
}
},
"tools": {
"z-ai-image*": false
},
"agent": {
"design-agent": {
"tools": {
"z-ai-image*": true
}
}
}
}
Available Tools
1. list_models
List all available image generation models and their capabilities.
Use this tool to discover available models, their features, and recommended settings.
2. generate_image
Generate an image synchronously from a text prompt.
Parameters:
prompt(required): Text description of the image (max 4000 characters)model(optional):glm-imageorcogview-4-250304(default:glm-image)size(optional): Image dimensions, e.g.,1280x1280(default:1280x1280)quality(optional):hdorstandard(default:hdfor GLM-Image)user_id(optional): End user ID for abuse prevention (6-128 characters)
Example:
Generate an image of a cute kitten sitting on a windowsill with a sunset background.
3. generate_image_async
Start an asynchronous image generation task. Returns a task ID for polling.
Parameters:
prompt(required): Text description of the imagemodel(optional): Onlyglm-imagesupports async (default:glm-image)size(optional): Image dimensions (default:1280x1280)quality(optional): Onlyhdsupported for async (default:hd)user_id(optional): End user ID for abuse prevention
Example:
Start async generation of a complex poster design.
4. get_async_result
Retrieve the result of an asynchronous image generation task.
Parameters:
task_id(required): The task ID fromgenerate_image_async
Example:
Check the status of task ID "task-12345".
5. download_image
Download an image from a URL and return it as base64 or save to a file.
Parameters:
url(required): The URL of the image to download (e.g., fromgenerate_imageorget_async_result)output(optional):base64orfile_output(default:base64)file_output(optional): Absolute path to save the image file (required if output isfile_output). Example:/path/to/image.png
Output Modes:
base64: Returns the image data directly as base64 (auto-switches to file if > 1MB)file_output: Saves the image to disk at the specified path
Example:
Download the generated image and save it to /home/user/images/logo.png
Note: Z.AI image URLs expire after 30 days. Use this tool to download and store images permanently.
6. generate_and_download_image ⭐ Recommended
Generate an image and automatically download it in a single operation. This is the most convenient tool when you want the image data immediately.
Parameters:
prompt(required): Text description of the image (max 4000 characters)model(optional):glm-imageorcogview-4-250304(default:glm-image)size(optional): Image dimensions, e.g.,1280x1280(default:1280x1280)quality(optional):hdorstandard(default:hdfor GLM-Image)user_id(optional): End user ID for abuse prevention (6-128 characters)output(optional):base64orfile_output(default:base64)file_output(optional): Absolute path to save the image file (required if output isfile_output)poll_interval(optional): Seconds to wait between polling for async results (default: 3)max_wait(optional): Maximum seconds to wait for generation (default: 120)
Output Modes:
base64: Returns the image data directly as base64 (auto-switches to file if > 1MB)file_output: Saves the image to disk at the specified path
Examples:
# Generate and get as base64
Generate a logo for my company and show me the image.
# Generate and save to file
Generate a logo and save it to /home/user/images/logo.png
Behavior:
- For GLM-Image: Uses async API with automatic polling until complete
- For CogView-4: Uses synchronous API
- Automatically downloads the result once generation completes
- Returns image as base64 or saves to specified path
Video Generation Tools
7. list_video_models
List all available video generation models and their capabilities.
Use this tool to discover available video models, their features, and supported parameters.
8. generate_video
Generate a video asynchronously from text or images. Returns a task ID for polling.
Parameters:
model(required): Video generation modelcogvideox-3: Z.AI flagship model (up to 4K, 5-10s, audio support)viduq1-text: Text-to-video, 1080P, 5sviduq1-image: Image-to-video, 1080P, 5sviduq1-start-end: Start-end frame, 1080P, 5svidu2-image: Image-to-video, 720P, 4s (faster, cheaper)vidu2-start-end: Start-end frame, 720P, 4svidu2-reference: Reference-based, 720P, 4s
prompt(optional): Text description (max 512 characters)image_url(optional): Image URL(s) for image-to-video generationquality(CogVideoX-3):qualityorspeedsize(optional): Video resolutionduration(optional): Video duration in secondsfps(CogVideoX-3): 30 or 60with_audio(optional): Generate AI sound effectsstyle(Vidu Q1 text):generaloranimeaspect_ratio(Vidu Q1/2):16:9,9:16, or1:1movement_amplitude(Vidu):auto,small,medium, orlargeuser_id(optional): End user ID for abuse prevention
Examples:
# Text-to-video
Generate a video of a cat playing with a ball.
# Image-to-video
Animate this image: [image_url]
# Start-end frame
Create a smooth transition from [first_frame] to [last_frame].
9. get_video_result
Retrieve the result of an asynchronous video generation task.
Parameters:
task_id(required): The task ID fromgenerate_video
Note: Video generation typically takes 30 seconds to several minutes depending on duration and quality.
10. generate_and_download_video ⭐ Recommended
Generate a video and automatically download it. Polls for completion and saves the video file.
Parameters:
- All parameters from
generate_videoplus: file_output(optional): Absolute path to save the video filepoll_interval(optional): Seconds to wait between polling (default: 10)max_wait(optional): Maximum seconds to wait (default: 300)
Example:
Generate a video of a sunset over the ocean and save it to /home/user/videos/sunset.mp4
Note: Videos are always saved to file (too large for base64). Video URLs expire after 1 day.
Models
GLM-Image
Z.AI's flagship image generation model with a hybrid autoregressive + diffusion architecture.
- Best for: Complex compositions, text rendering, detailed illustrations, commercial posters
- Quality options:
hd(detailed, ~20s),standard(faster, ~5-10s) - Size range: 1024-2048px per dimension (divisible by 32)
- Recommended sizes: 1280×1280, 1568×1056, 1056×1568, 1472×1088, 1088×1472, 1728×960, 960×1728
- Async support: Yes
CogView-4-250304
General-purpose image generation with fast text understanding.
- Best for: General image generation, quick iterations
- Quality options:
hd,standard - Size range: 512-2048px per dimension (divisible by 16)
- Recommended sizes: 1024×1024, 768×1344, 864×1152, 1344×768, 1152×864, 1440×720, 720×1440
- Async support: No
Video Models
CogVideoX-3
Z.AI's flagship video generation model with improved frame stability and clarity.
- Best for: Text-to-video, image-to-video, start-end frame animation
- Resolution: Up to 4K (3840x2160)
- Duration: 5 or 10 seconds
- Features: Audio generation, 30/60 FPS, quality/speed modes
- Price: $0.20/video
Vidu Q1
High-quality video generation with 1080P output.
| Model | Capability | Duration | Price |
|-------|------------|----------|-------|
| viduq1-text | Text-to-video | 5s | $0.40 |
| viduq1-image | Image-to-video | 5s | $0.40 |
| viduq1-start-end | Start-end frame | 5s | $0.40 |
- Features: General/anime styles, motion amplitude control
Vidu 2
Fast and cost-effective video generation with 720P output.
| Model | Capability | Duration | Price |
|-------|------------|----------|-------|
| vidu2-image | Image-to-video | 4s | $0.20 |
| vidu2-start-end | Start-end frame | 4s | $0.20 |
| vidu2-reference | Reference-based | 4s | $0.40 |
- Features: Audio generation, motion amplitude control, multi-image reference
Error Handling
The server handles various error scenarios:
| Error Type | Description |
|-----------|-------------|
| AUTH_ERROR | Invalid or missing API key |
| RATE_LIMIT | Too many requests - will auto-retry |
| VALIDATION_ERROR | Invalid parameters |
| SERVER_ERROR | Z.AI server issues - will auto-retry |
| NETWORK_ERROR | Connection issues - will auto-retry |
| TIMEOUT_ERROR | Request timeout - will auto-retry |
| CONTENT_FILTER | Prompt blocked by content policy |
Development
Setup
git clone <repo-url>
cd z-ai-image-mcp
npm install
cp .env.example .env
# Edit .env with your API key
Scripts
npm run build # Build TypeScript
npm run dev # Run in development mode
npm test # Run all tests
npm run test:unit # Run unit tests only
npm run test:integration # Run integration tests
npm run test:e2e # Run E2E tests
npm run test:coverage # Run tests with coverage
npm run typecheck # Type check without emit
License
MIT