M
MCP Qwen3 Tts
A Model Context Protocol (MCP) server that exposes Qwen3-TTS voice synthesis capabilities with Voice Design.
Created 1/27/2026
Updated about 7 hours ago
README
Repository documentation and setup instructions
Qwen3-TTS MCP Server
A Model Context Protocol (MCP) server that exposes Qwen3-TTS voice synthesis capabilities with Voice Design.
Features
- Advanced Voice Synthesis: Generate realistic audio using the Qwen3-TTS 1.7B model
- Voice Design: Customize voice with natural language descriptions
- Multilingual: Support for 10 languages (Auto, Chinese, English, Japanese, Korean, French, German, Spanish, Portuguese, Russian)
- MCP Integration: Access via Model Context Protocol for integration with LLMs
Installation
pip install -e .
Usage
As MCP Server
Start the MCP server:
python main.py
The server will connect via stdin/stdout and expose the following tools:
Available Tools
-
generate_tts: Generate audio from text
text: Text to convert to speech (single string or list for batch)language: Language (Auto, Chinese, English, etc.)voice_description: Description of desired voice characteristicsmax_tokens: Maximum generation tokens (default: 2048)
Example (single):
{ "text": "Hello, how are you?", "language": "English", "voice_description": "deep male voice, friendly tone" }Example (batch):
{ "text": ["Hello, how are you?", "こんにちは、元気ですか?"], "language": ["English", "Japanese"], "voice_description": ["deep male voice, friendly tone", "cheerful female voice"] } -
generate_tts_voice_clone: Generate audio using voice cloning from reference audio
target_text: Text to convert to speech with cloned voicelanguage: Language (Auto, Chinese, English, etc.)ref_audio_base64: Reference audio encoded in base64 (WAV, 3-10 seconds recommended)ref_text: Transcript of reference audio (required ifuse_xvector_onlyis False)use_xvector_only: If True, use only speaker embedding (faster but less accurate)model_size: Model size ("0.6B" or "1.7B", default: "1.7B")max_tokens: Maximum generation tokens (default: 2048)
Example:
{ "target_text": "Hello, how are you today?", "language": "English", "ref_audio_base64": "UklGRi4...", "ref_text": "Okay. Yeah. I resent you. I love you.", "use_xvector_only": false, "model_size": "1.7B" } -
list_languages: List all supported languages
MCP Architecture
The server implements the MCP specification with:
- Tools: Available tools for clients to call
- Resources: (Future) Access to files and data
- Prompts: (Future) Reusable prompts for TTS
Requirements
- Python >= 3.12
- CUDA (recommended for better performance)
- GPU with sufficient memory (4GB+)
- Optional: flash-attn for improved performance with flash attention
Project Structure
.
├── main.py # MCP server implementation
├── pyproject.toml # Project configuration
├── requirements.txt # Python dependencies
└── README.md # This file
Performance Optimizations
This implementation uses optimizations from the official Qwen3-TTS repository:
- Flash Attention 2 for faster inference and lower memory usage
- bfloat16 precision for efficient computation
- Batch inference support for processing multiple requests
Next Steps
- [ ] Add support for MCP resources
- [ ] Implement generated audio caching
- [ ] Add structured logging
- [ ] Create example client
- [ ] Deployment documentation
Quick Setup
Installation guide for this server
Install Package (if required)
uvx mcp-qwen3-tts
Cursor configuration (mcp.json)
{
"mcpServers": {
"gabrielalmir-mcp-qwen3-tts": {
"command": "uvx",
"args": [
"mcp-qwen3-tts"
]
}
}
}