MCP Servers

A collection of Model Context Protocol servers, templates, tools and more.

A Model Context Protocol (MCP) server that exposes Qwen3-TTS voice synthesis capabilities with Voice Design.

Created 1/27/2026
Updated about 7 hours ago
Repository documentation and setup instructions

Qwen3-TTS MCP Server

A Model Context Protocol (MCP) server that exposes Qwen3-TTS voice synthesis capabilities with Voice Design.

Features

  • Advanced Voice Synthesis: Generate realistic audio using the Qwen3-TTS 1.7B model
  • Voice Design: Customize voice with natural language descriptions
  • Multilingual: Support for 10 languages (Auto, Chinese, English, Japanese, Korean, French, German, Spanish, Portuguese, Russian)
  • MCP Integration: Access via Model Context Protocol for integration with LLMs

Installation

pip install -e .

Usage

As MCP Server

Start the MCP server:

python main.py

The server will connect via stdin/stdout and expose the following tools:

Available Tools

  1. generate_tts: Generate audio from text

    • text: Text to convert to speech (single string or list for batch)
    • language: Language (Auto, Chinese, English, etc.)
    • voice_description: Description of desired voice characteristics
    • max_tokens: Maximum generation tokens (default: 2048)

    Example (single):

    {
      "text": "Hello, how are you?",
      "language": "English",
      "voice_description": "deep male voice, friendly tone"
    }
    

    Example (batch):

    {
      "text": ["Hello, how are you?", "こんにちは、元気ですか?"],
      "language": ["English", "Japanese"],
      "voice_description": ["deep male voice, friendly tone", "cheerful female voice"]
    }
    
  2. generate_tts_voice_clone: Generate audio using voice cloning from reference audio

    • target_text: Text to convert to speech with cloned voice
    • language: Language (Auto, Chinese, English, etc.)
    • ref_audio_base64: Reference audio encoded in base64 (WAV, 3-10 seconds recommended)
    • ref_text: Transcript of reference audio (required if use_xvector_only is False)
    • use_xvector_only: If True, use only speaker embedding (faster but less accurate)
    • model_size: Model size ("0.6B" or "1.7B", default: "1.7B")
    • max_tokens: Maximum generation tokens (default: 2048)

    Example:

    {
      "target_text": "Hello, how are you today?",
      "language": "English",
      "ref_audio_base64": "UklGRi4...",
      "ref_text": "Okay. Yeah. I resent you. I love you.",
      "use_xvector_only": false,
      "model_size": "1.7B"
    }
    
  3. list_languages: List all supported languages

MCP Architecture

The server implements the MCP specification with:

  • Tools: Available tools for clients to call
  • Resources: (Future) Access to files and data
  • Prompts: (Future) Reusable prompts for TTS

Requirements

  • Python >= 3.12
  • CUDA (recommended for better performance)
  • GPU with sufficient memory (4GB+)
  • Optional: flash-attn for improved performance with flash attention

Project Structure

.
├── main.py              # MCP server implementation
├── pyproject.toml       # Project configuration
├── requirements.txt     # Python dependencies
└── README.md            # This file

Performance Optimizations

This implementation uses optimizations from the official Qwen3-TTS repository:

  • Flash Attention 2 for faster inference and lower memory usage
  • bfloat16 precision for efficient computation
  • Batch inference support for processing multiple requests

Next Steps

  • [ ] Add support for MCP resources
  • [ ] Implement generated audio caching
  • [ ] Add structured logging
  • [ ] Create example client
  • [ ] Deployment documentation
Quick Setup
Installation guide for this server

Install Package (if required)

uvx mcp-qwen3-tts

Cursor configuration (mcp.json)

{ "mcpServers": { "gabrielalmir-mcp-qwen3-tts": { "command": "uvx", "args": [ "mcp-qwen3-tts" ] } } }