MCP server by d4nnd4
STT-Base-Project
This project serves as a demonstration of works I've done in the past when it comes to TTS and STT implementations with a frontend. This is a demonstration project, so security issues are expected and documented.
FrontOffice Voice Console
This project shows a production-grade voice AI application for medical front office workflows, featuring real-time speech-to-text (STT), intent recognition, and text-to-speech (TTS) with HIPAA-minded design principles. The NLP models used are applied with Piper and Whisper, but its structure allows for migration and expandability as well. A template simulating early versions of Fallout's Robco interfaces was used from another projects I used to show as portfolio projects. This project enacts a roleplay with an MCP model, it responds in real time any requests done to it.
Quick Start
docker compose up --build
After containers are running, please refer to the following to kickstart the application:
- Frontend UI: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
Demo Flow
- Open
http://localhost:5173 - Click Record and speak a request (or upload a sample audio file if using the API directly)
- Confirm transcript appears in the transcript area
- Confirm intent + entities populate in the Intent Card
- Click Speak Response and hear the TTS audio
- Click export and select a format with the conversation records
- Open backend docs at
http://localhost:8000/docs - Check health endpoints:
/api/healthz,/api/readyz
Features
Core Features
-
Speech-to-Text: Local transcription using a stable version for Faster Whisper
-
Intent Recognition: Rule-based classification with entity extraction, meaning keywords and timestamps for appointments
- Appointment Scheduling
- Financial Clearance (insurance, billing)
- General Inquiries (hours, location, contact)
-
Text-to-Speech: Local synthesis using one of the Piper TTS available models
-
Privacy Mode: PII redaction for HIPAA-minded compliance. This feature is currently turned off for demonstration of the STT model.
-
Observability: Structured logging with request tracing and latency metrics; you can check this out within the Docker container context.
Technical Highlights
- Provider Abstraction: Easy swapping between local and cloud providers (AWS, Azure, GCP)
- Type Safety: Full TypeScript frontend + Pydantic/FastAPI backend
- Containerized: Single-command deployment via Docker Compose
- Health Endpoints: Liveness (
/healthz) and readiness (/readyz) checks, accessible from the API endpoints - RESTful API: OpenAPI/Swagger documentation at
/docswith the backend port - Real-time Updates: WebSocket support for streaming transcription (this is optional, but works with the current demonstration)
Architecture
Backend (Python + FastAPI)
backend/
├── app/
│ ├── main.py # FastAPI application entry point
│ ├── core/
│ │ └── config.py # Configuration management
│ ├── api/
│ │ ├── routes.py # API endpoint handlers
│ │ └── schemas.py # Pydantic models
│ ├── providers/
│ │ ├── base.py # Provider interfaces
│ │ ├── stt_whisper.py # Faster Whisper STT
│ │ └── tts_piper.py # Piper TTS
│ ├── nlu/
│ │ └── intent_router.py # Intent classification
│ ├── telemetry/
│ │ └── logging_config.py # Structured logging
│ └── utils/
│ └── redaction.py # PII redaction
├── tests/ # Pytest test suite
├── Dockerfile
└── requirements.txt
Frontend (React + TypeScript + Vite)
frontend/
├── src/
│ ├── main.tsx # Application entry point
│ ├── App.tsx # Main app component
│ ├── pages/
│ │ ├── Demo.tsx # Voice console interface
│ │ ├── Architecture.tsx # Architecture documentation
│ │ ├── Reliability.tsx # Reliability information
│ │ └── About.tsx # About page
│ ├── components/
│ │ ├── AudioRecorder.tsx # Microphone recording
│ │ ├── AudioPlayer.tsx # TTS playback
│ │ ├── TranscriptEditor.tsx
│ │ ├── IntentCard.tsx
│ │ └── DebugDrawer.tsx
│ └── lib/
│ └── api.ts # API client
├── Dockerfile
└── package.json
API Endpoints
STT
POST /api/stt/transcribe
Content-Type: multipart/form-data
Returns: { text, confidence, language, duration_ms }
Intent Recognition
POST /api/intent/route
Content-Type: application/json
Body: { text: "I need an appointment next Tuesday" }
Returns: { intent, confidence, entities, response_text }
TTS
POST /api/tts/speak
Content-Type: application/json
Body: { text: "Hello, how can I help you?", voice: "en_US-lessac-medium" }
Returns: audio/wav binary
Health Checks
GET /api/healthz # Basic liveness
GET /api/readyz # Readiness (checks all providers)
Local Development
Backend:
cd backend
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt
uvicorn app.main:app --reload
Frontend:
cd frontend
npm install
npm run dev
Testing
Backend:
cd backend
pytest tests/ -q # for faster recollection
Frontend:
cd frontend
npm run test
Security
What This Project Implements
1. Privacy-First Design
-
No Default Persistence: Audio files and transcripts are not stored by default, scripts are hardcoded and are expected to be migrated when an MCP is active
-
PII Redaction: Automatic redaction of:
- Phone numbers (XXX-XXX-XXXX patterns)
- Email addresses
- Social Security Numbers
- Common first names (with NER placeholder)
-
Privacy Mode Toggle: Users can enable/disable server-side processing controls
-
Audit Logging: All API requests tracked with unique request IDs
-
This feature has been turned off for demonstration purposes
2. Data Minimization
- Audio is processed in-memory and discarded immediately after transcription
- Only aggregate metrics are logged (duration, confidence scores)
- No raw audio bytes in logs
- Transcripts can be redacted before logging in privacy mode
3. Observability Without Exposure
- Structured Logging: JSON logs with request tracing
- No Sensitive Data in Logs: PII redacted when privacy mode enabled
- Request Correlation: Unique request IDs for debugging without exposing user data
- Health Checks: Separate liveness and readiness endpoints
4. Defense in Depth (Basic)
- CORS Protection: Configurable allowed origins
- Input Validation: Pydantic schemas for all API inputs
- Error Handling: Generic error messages to clients (no stack traces in production)
- Timeouts: All provider operations have configurable timeouts
What This Project Does NOT Implement
This portfolio project intentionally omits these production requirements to keep it demo-friendly:
No Authentication & Authorization
- No user authentication (OAuth2, OIDC, SAML)
- No role-based access control (RBAC)
- No API key management
- No session management
No Encryption
- No encryption at rest
- No TLS/HTTPS (assumes reverse proxy handles this)
- No encrypted backups
- No key rotation
Neither Audit & Compliance
- No tamper-proof audit logs
- No log retention policies enforced
- No compliance reports
- No data deletion workflows
No Infrastructure Security
- No network segmentation
- No secrets management (HashiCorp Vault, AWS Secrets Manager)
- No container scanning
- No vulnerability management
No Business Continuity
- No backup and recovery procedures
- No disaster recovery plan
- Little control damage
- No high availability configuration
- No failover mechanisms
Production HIPAA Compliance Roadmap
If this project were to be used in a real healthcare setting, here's what would be required:
1. Access Controls
Technical Controls:
- Implement OAuth2/OIDC authentication with MFA
- Role-based access control (RBAC) with least privilege
- Automated session timeouts (15 minutes idle)
- Unique user IDs for all access
- Terminate sessions on logout
Administrative Controls:
- User access reviews (quarterly)
- Workforce training on security policies
- Documented authorization procedures
- Access termination procedures
2. Encryption
Data in Transit:
- TLS 1.3 for all HTTP traffic
- Mutual TLS (mTLS) for service-to-service communication
- VPN for remote access
Data at Rest:
- AES-256 encryption for all stored data (if any)
- Encrypted database volumes
- Encrypted backups
- Hardware security modules (HSMs) for key management
Key Management:
- Automated key rotation (90 days)
- Separate keys per environment (dev/staging/prod)
- Key access logging
- Key escrow procedures
3. Audit Logging
Requirements:
- Log all PHI access (who, what, when, where, why)
- Immutable audit logs (write-once-read-many storage)
- Log retention: 6 years minimum (HIPAA requirement)
- Real-time alerting on suspicious activities
- Centralized log aggregation (SIEM)
Logged Events:
- Authentication attempts (success/failure)
- PHI access and modifications
- Configuration changes
- Failed authorization attempts
- System errors affecting PHI
4. Business Associate Agreements (BAAs)
Required BAAs:
- Cloud provider (AWS/Azure/GCP)
- STT/TTS service providers (if cloud-based)
- Database hosting provider
- Logging/monitoring services
- Backup/DR services
BAA Requirements:
- Vendor HIPAA compliance attestation
- Data processing limitations
- Breach notification obligations
- Subcontractor disclosure
- Right to audit
5. Data Retention & Disposal
Retention Policies:
- Define retention periods per data type
- Automated purging of expired data
- Patient right to request deletion
- Legal hold procedures
Secure Disposal:
- Cryptographic erasure (destroy keys)
- Physical media destruction (if applicable)
- Disposal certificates
- Vendor disposal verification
6. Risk Management
Risk Assessment (Annual):
- Threat modeling
- Vulnerability scanning
- Penetration testing
- Third-party security assessments
Incident Response:
- Documented incident response plan
- Breach notification procedures (60 days)
- Forensic investigation capabilities
- Communication templates
7. Infrastructure Hardening
Network Security:
- DMZ architecture
- Firewall rules (least privilege)
- Intrusion detection/prevention (IDS/IPS)
- DDoS protection
Container Security:
- Image scanning for vulnerabilities
- Signed images only
- Runtime security monitoring
- Resource limits (CPU, memory)
Database Security:
- Encrypted connections
- Database activity monitoring
- Parameterized queries (SQL injection prevention)
- Least privilege database roles
8. Monitoring & Alerting
Metrics:
- Failed authentication attempts (threshold: 5/minute)
- Unusual API access patterns
- High-volume data exports
- Provider health status
- Resource utilization
Alerting:
- PagerDuty/Opsgenie integration
- Escalation procedures
- On-call rotation
- Incident runbooks