MCP Servers

A collection of Model Context Protocol servers, templates, tools and more.

A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.

Created 6/3/2025
Updated 2 days ago
Repository documentation and setup instructions

Local Speech-to-Text MCP Server

A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.

๐ŸŽฏ Features

  • ๐Ÿ  100% Local Processing: No cloud APIs, complete privacy
  • ๐Ÿš€ Apple Silicon Optimized: 15x+ real-time transcription speed
  • ๐ŸŽค Speaker Diarization: Identify and separate multiple speakers
  • ๐ŸŽต Universal Audio Support: Automatic conversion from MP3, M4A, FLAC, and more
  • ๐Ÿ“ Multiple Output Formats: txt, json, vtt, srt, csv
  • ๐Ÿ’พ Low Memory Footprint: <2GB memory usage
  • ๐Ÿ”ง TypeScript: Full type safety and modern development

๐Ÿš€ Quick Start

Prerequisites

  • Node.js 18+
  • whisper.cpp (brew install whisper-cpp)
  • For audio format conversion: ffmpeg (brew install ffmpeg) - automatically handles MP3, M4A, FLAC, OGG, etc.
  • For speaker diarization: Python 3.8+ and HuggingFace token (free)

Supported Audio Formats

  • Native whisper.cpp formats: WAV, FLAC
  • Auto-converted formats: MP3, M4A, AAC, OGG, WMA, and more
  • Automatic conversion: Powered by ffmpeg with 16kHz/mono optimization for whisper.cpp
  • Format detection: Automatic format detection and conversion when needed

Installation

git clone https://github.com/your-username/local-stt-mcp.git
cd local-stt-mcp/mcp-server
npm install
npm run build

# Download whisper models
npm run setup:models

# For speaker diarization, set HuggingFace token
export HF_TOKEN="your_token_here"  # Get free token from huggingface.co

Speaker Diarization Note: Requires HuggingFace account and accepting pyannote/speaker-diarization-3.1 license.

MCP Client Configuration

Add to your MCP client configuration:

{
  "mcpServers": {
    "whisper-mcp": {
      "command": "node",
      "args": ["path/to/local-stt-mcp/mcp-server/dist/index.js"]
    }
  }
}

๐Ÿ› ๏ธ Available Tools

| Tool | Description | |------|-------------| | transcribe | Basic audio transcription with automatic format conversion | | transcribe_long | Long audio file processing with chunking and format conversion | | transcribe_with_speakers | Speaker diarization and transcription with format support | | list_models | Show available whisper models | | health_check | System diagnostics | | version | Server version information |

๐Ÿ“Š Performance

Apple Silicon Benchmarks:

  • Processing Speed: 15.8x real-time (vs WhisperX 5.5x)
  • Memory Usage: <2GB (vs WhisperX ~4GB)
  • GPU Acceleration: โœ… Apple Neural Engine
  • Setup: Medium complexity but superior performance

See /benchmarks/ for detailed performance comparisons.

๐Ÿ—๏ธ Project Structure

mcp-server/
โ”œโ”€โ”€ src/                    # TypeScript source code
โ”‚   โ”œโ”€โ”€ tools/             # MCP tool implementations
โ”‚   โ”œโ”€โ”€ whisper/           # whisper.cpp integration
โ”‚   โ”œโ”€โ”€ utils/             # Speaker diarization & utilities
โ”‚   โ””โ”€โ”€ types/             # Type definitions
โ”œโ”€โ”€ dist/                  # Compiled JavaScript
โ””โ”€โ”€ python/                # Python dependencies

๐Ÿ”ง Development

# Build
npm run build

# Development mode (watch)
npm run dev

# Linting & formatting
npm run lint
npm run format

# Type checking
npm run type-check

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ™ Acknowledgments

Quick Setup
Installation guide for this server

Install Package (if required)

npx @modelcontextprotocol/server-local-stt-mcp

Cursor configuration (mcp.json)

{ "mcpServers": { "smartlittleapps-local-stt-mcp": { "command": "npx", "args": [ "smartlittleapps-local-stt-mcp" ] } } }