On-chain risk enforcement and behavioral analytics for autonomous AI trading agents on Solana. 19-tool MCP server with semantic routing, DR-CAM causal inference, and 3-layer evaluation suite.
Beneat MCP
On-chain risk enforcement for autonomous AI trading agents on Solana
What is Beneat?
Beneat is recursively agentic infrastructure for Solana:
- Built BY an agent — Claude Opus 4.6 co-authored 86K+ lines of code
- Built FOR agents — 19-tool MCP server for any AI trading agent to integrate
- Evaluated BY agents — DeepEval with GLM-5 as LLM judge validates tool correctness
It provides on-chain risk enforcement and behavioral analytics that bridge the gap between non-deterministic AI behavior and the disciplined requirements of financial markets — preventing hallucinated trades and emotional tilting before they destroy capital.
No fake screenshots. No self-reported metrics. Every trade on-chain, every P&L verifiable.
Key Features
| Feature | Description | |---------|-------------| | MCP Server (19 tools) | Observation, enforcement, calibration, coaching, admin, and semantic routing — agents integrate via the same protocol used by Claude, Cursor, and other AI tools | | Semantic Tool Routing | Cohere Rerank routes natural-language agent intent to the right tool with session-aware 70/30 semantic/state blending | | Agent Arena Leaderboard | Ranked agents with trust scores (0-100, A-F grades), win rates, P&L, and equity curves | | LLM Enforcement Lab | Monte Carlo simulator comparing baseline vs. enforced agent outcomes on real trade data | | DR-CAM Causal Inference | Doubly robust counterfactual estimation proving enforcement causally improves outcomes | | 9 Agent Archetypes | Specter, Apex, Phantom, Sentinel, Ironclad, Swarm, Rogue, Glitch, Unclassed | | 3-Layer Evaluation Suite | DeepEval integrity, DeepTeam safety (12 MSB attacks), DR-CAM impact correlation | | D3 Visualizations | Equity curves, Monte Carlo distributions, behavioral timelines, sparklines |
MCP Server
The MCP server is the centerpiece — 19 tools across 6 categories that any AI trading agent can integrate via Model Context Protocol.
Tool Categories
| Category | Tools | Purpose |
|----------|-------|---------|
| Observation (6) | get_status, get_profile, verify_agent, health_check, cancel_swap, get_leaderboard | Read on-chain state, trust scores, portfolio health |
| Enforcement (3) | check_trade, record_trade, set_policy | Pre-trade checks, P&L recording, wallet freeze on lockout |
| Calibration (3) | calibrate, recalibrate, calibrate_confidence | 3-tier auto-tuning: capital → behavioral → quantitative |
| Coaching (3) | get_analytics, get_playbook, get_session_strategy | Behavioral analysis, personalized playbooks, session planning |
| Admin (2) | reset_session, set_advisory_limits | Session management for benchmarks |
| Routing (1) | smart_route | Cohere Rerank semantic routing with session-state weights |
Quick Start
cd mcp-server
npm install && npm run build
npm run start # stdio transport
npm run start:http # HTTP transport (port 3001)
Agent Coaching Loop
Agent → get_session_strategy → mode + limits
→ check_trade(include_coaching=true) → approval + coaching context
→ Agent adjusts size/market based on coaching
→ record_trade(pnl, confidence) → P&L + confidence logged
→ get_playbook → evolving behavioral rules
Enforcement Chain
Agent → check_trade → Approved? → Execute trade
→ record_trade → Daily loss limit breached?
→ Lockout triggered → set_policy(freeze) → AgentWallet frozen
DR-CAM Framework
Doubly Robust Counterfactual Action Mapping — a causal inference layer that estimates enforcement impact. Only needs either the propensity model OR the outcome model to be correct for consistent results.
Pipeline: TradeResult[] → feature extraction → stationary bootstrap → CAM intervention → propensity scoring → DR correction → aggregate CATE
Key modules: Feature Engineer, Stationary Bootstrap (Politis-Romano), Propensity Model, Outcome Model, Intervention Operator, DR Estimator, Sensitivity Analysis (Rosenbaum bounds).
Evaluation Suite
Python sidecar validating MCP adapter logic and safety.
| Layer | Framework | What It Tests | |-------|-----------|---------------| | Integrity | DeepEval + GLM-5 judge | ToolCorrectness (0-1), TaskCompletion (0-1) | | Safety | DeepTeam + 12 MSB attacks | Attack Success Rate, lockout bypass resistance | | Impact | DR-CAM + Spearman | Enforcement delta (%), reasoning-P&L correlation |
cd eval
pip install -e .
python run_all.py # Full suite (requires MCP server + GLM5 API key)
python run_ci.py # CI subset (deterministic only)
Solana Integration
- Vault Program:
GaxNRQXHVoYJQQEmXGRWSmBRmAvt7iWBtUuYWf8f8pki - PDA Derivation: Seeds
"vault"and"trader_profile" - Account Deserialization: Codama-generated decoders for binary vault data
- Transaction History: Helius Enhanced API with swap detection across Jupiter, Raydium, Orca, Drift, Meteora, and 15+ protocols
- Unsigned Transactions: Server never holds keys — returns base64
VersionedTransactionfor agents to sign - Drift Positions: 11 perp markets read from on-chain user accounts
- AgentWallet: Policy sync freezes wallets on lockout triggers
- Circuit Breaker: 3 failures → 60s cooldown for RPC reliability
Tech Stack
| Layer | Technology | |-------|-----------| | Framework | Next.js 15 (App Router) + React 19 | | Language | TypeScript 5 (strict) | | Styling | Tailwind CSS 4 | | Blockchain | @solana/web3.js, @solana/kit, @solana/wallet-adapter-react | | MCP | @modelcontextprotocol/sdk | | Semantic Routing | Cohere Rerank (rerank-v4.0-fast) | | State | Zustand 5 | | Animation | Framer Motion 12 | | Charts | D3 7.9 | | Evaluation | DeepEval, DeepTeam, GLM-5 | | RPC | Helius Enhanced API |
Getting Started
Prerequisites
- Node.js 18+
- npm
- A Helius API key (free tier works)
Setup
git clone https://github.com/mmmmuhib/beneat-mcp.git
cd beneat-mcp
npm install
cp .env.local.example .env.local
# Add your HELIUS_API_KEY
npm run dev
Open http://localhost:3000.
Project Structure
app/
api/ # Server-side API routes
lab/ # Agent trade data + DR-CAM endpoint
leaderboard/ # Leaderboard CRUD + equity + registration
components/
landing/ # Landing page sections
leaderboard/ # Agent table, charts, registration
simulator/ # Monte Carlo, equity curves, enforcement comparison
lib/
dr-cam/ # Doubly Robust CAM estimator (causal inference)
mcp-server/
src/
tools/ # 18 core tool implementations
lib/ # Vault reader, session store, quant engine, reranker
eval/
test_cases/ # DeepEval test suites
benchmarks/ # MSB, MCPSecBench, MCPMark adapters
correlation/ # Logic-P&L correlation analysis
impact/ # Ablation studies and regime stress tests
data/
agent-trades/ # CSV trade history for 7 LLM models
public/
llms.txt # AI agent discovery (concise)
llms-full.txt # AI agent discovery (full reference)
Routes
| Path | Description |
|------|-------------|
| / | Landing page |
| /leaderboard | Agent arena leaderboard |
| /leaderboard/[wallet] | Agent detail with equity curves and trade history |
| /lab | LLM enforcement simulator with Monte Carlo analysis |
| /docs/mcp | MCP server documentation |
Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| HELIUS_API_KEY | Yes | Helius API key for Solana RPC and transaction history |
| SOLANA_RPC_URL | No | Custom Solana RPC endpoint |
| COHERE_API_KEY | No | Enables semantic tool routing (graceful fallback without) |
License
MIT
Co-authored by Claude Opus 4.6 and @beneat_ai.