MCP Servers

模型上下文协议服务器、框架、SDK 和模板的综合目录。

A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.

创建于 6/3/2025
更新于 3 days ago
Repository documentation and setup instructions

Local Speech-to-Text MCP Server

A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.

🎯 Features

  • 🏠 100% Local Processing: No cloud APIs, complete privacy
  • 🚀 Apple Silicon Optimized: 15x+ real-time transcription speed
  • 🎤 Speaker Diarization: Identify and separate multiple speakers
  • 🎵 Universal Audio Support: Automatic conversion from MP3, M4A, FLAC, and more
  • 📝 Multiple Output Formats: txt, json, vtt, srt, csv
  • 💾 Low Memory Footprint: <2GB memory usage
  • 🔧 TypeScript: Full type safety and modern development

🚀 Quick Start

Prerequisites

  • Node.js 18+
  • whisper.cpp (brew install whisper-cpp)
  • For audio format conversion: ffmpeg (brew install ffmpeg) - automatically handles MP3, M4A, FLAC, OGG, etc.
  • For speaker diarization: Python 3.8+ and HuggingFace token (free)

Supported Audio Formats

  • Native whisper.cpp formats: WAV, FLAC
  • Auto-converted formats: MP3, M4A, AAC, OGG, WMA, and more
  • Automatic conversion: Powered by ffmpeg with 16kHz/mono optimization for whisper.cpp
  • Format detection: Automatic format detection and conversion when needed

Installation

git clone https://github.com/your-username/local-stt-mcp.git
cd local-stt-mcp/mcp-server
npm install
npm run build

# Download whisper models
npm run setup:models

# For speaker diarization, set HuggingFace token
export HF_TOKEN="your_token_here"  # Get free token from huggingface.co

Speaker Diarization Note: Requires HuggingFace account and accepting pyannote/speaker-diarization-3.1 license.

MCP Client Configuration

Add to your MCP client configuration:

{
  "mcpServers": {
    "whisper-mcp": {
      "command": "node",
      "args": ["path/to/local-stt-mcp/mcp-server/dist/index.js"]
    }
  }
}

🛠️ Available Tools

| Tool | Description | |------|-------------| | transcribe | Basic audio transcription with automatic format conversion | | transcribe_long | Long audio file processing with chunking and format conversion | | transcribe_with_speakers | Speaker diarization and transcription with format support | | list_models | Show available whisper models | | health_check | System diagnostics | | version | Server version information |

📊 Performance

Apple Silicon Benchmarks:

  • Processing Speed: 15.8x real-time (vs WhisperX 5.5x)
  • Memory Usage: <2GB (vs WhisperX ~4GB)
  • GPU Acceleration: ✅ Apple Neural Engine
  • Setup: Medium complexity but superior performance

See /benchmarks/ for detailed performance comparisons.

🏗️ Project Structure

mcp-server/
├── src/                    # TypeScript source code
│   ├── tools/             # MCP tool implementations
│   ├── whisper/           # whisper.cpp integration
│   ├── utils/             # Speaker diarization & utilities
│   └── types/             # Type definitions
├── dist/                  # Compiled JavaScript
└── python/                # Python dependencies

🔧 Development

# Build
npm run build

# Development mode (watch)
npm run dev

# Linting & formatting
npm run lint
npm run format

# Type checking
npm run type-check

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

快速设置
此服务器的安装指南

安装包 (如果需要)

npx @modelcontextprotocol/server-local-stt-mcp

Cursor 配置 (mcp.json)

{ "mcpServers": { "smartlittleapps-local-stt-mcp": { "command": "npx", "args": [ "smartlittleapps-local-stt-mcp" ] } } }