SarvaData MCP Server

Agent-native data infrastructure from India

A production-ready Model Context Protocol (MCP) server exposing Sarvadata tools for data ingestion, cleaning, embeddings, search, and reporting. Built for LLM agents, Comet Browser, and Hugging Face Spaces.

🚀 Quick Start

Prerequisites

Python 3.9+
Pip

Installation

Clone the repository

git clone <repository-url>
cd sarvadata-mcp

Install dependencies

pip install -r requirements.txt

Run the server

python server.py
# or
uvicorn server:app --host 0.0.0.0 --port 5000

The server will start at http://localhost:5000.

🛠️ Available Tools

The server exposes the following tools via MCP:

| Tool | Description | Use Case | |------|-------------|----------| | ingest_dataset | Import data from CSV (file or URL) | Data lake population | | clean_dataset | Remove nulls, normalize schema, deduplicate | Data quality | | create_embeddings | Generate vector embeddings (Mock/Stub) | Semantic search | | semantic_search | Query by meaning (Mock/Stub) | Knowledge retrieval | | generate_report | Create summary/quality/insights reports | Data documentation | | schema_validator | Validate JSON/CSV structure | Data Governance | | format_converter | Convert between CSV/JSON/XML | Data Transformation | | password_generator | Generate secure passwords | Security |

🤖 Using with AI Agents

Tool Discovery

Agents can query /mcp/tools to discover capabilities:

curl http://localhost:5000/mcp/tools

Tool Invocation

Execute a tool:

curl -X POST http://localhost:5000/mcp/call \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "ingest_dataset",
    "arguments": {
      "source_type": "url",
      "source_path": "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
    }
  }'

Example Agent Workflow

INGEST: Load raw data -> Returns dataset_id
CLEAN: Clean the dataset using dataset_id
REPORT: Generate insights report

☁️ Deployment

Hugging Face Spaces

This repository is ready for deployment to Hugging Face Spaces (Docker).

Create a new Space on Hugging Face.
Select Docker as the SDK.
Push this repository to the Space (or connect via GitHub).
The server will automatically start on port 7860.

Docker

docker build -t sarvadata-mcp .
docker run -p 5000:5000 sarvadata-mcp

🏗️ Architecture & Development

Project Structure

├── server.py           # FastAPI application and MCP endpoints
├── mcp_registry.py     # Tool registry and invocation routing
├── tools/              # Tool implementations (Pandas-based)
├── etl/                # Core data processing modules
├── schemas/            # JSON schemas for tools
└── tests/              # Test suite

Contributing

See CONTRIBUTING.md for details on how to add new tools and contribute to the project.

📄 License

MIT License - see LICENSE for details.

About SarvaData Platform

SarvaData is a comprehensive data tools platform featuring 50+ micro-tools plus a complete visual ETL pipeline builder. This MCP server exposes core SarvaData capabilities to AI agents.

Company Information:

AnkTechSol
Udyam Registration: UDYAM-MH-26-0439977
GST: 27MLOPK7764C1ZF
Website: https://anktechsol.com

MCP Servers