Adaptive Retrieval-Augmented Self-Refinement MCP Server — a closed-loop system that lets LLMs iteratively verify and correct their own claims using uncertainty-guided retrieval.
ARSR MCP Server
Adaptive Retrieval-Augmented Self-Refinement — a closed-loop MCP server that lets LLMs iteratively verify and correct their own claims using uncertainty-guided retrieval.
What it does
Unlike one-shot RAG (retrieve → generate), ARSR runs a refinement loop:
Generate draft → Decompose claims → Score uncertainty
↑ ↓
Decide stop ← Revise with evidence ← Retrieve for low-confidence claims
The key insight: retrieval is guided by uncertainty. Only claims the model is unsure about trigger evidence fetching, and the queries are adversarial — designed to disprove the claim, not just confirm it.
Architecture
The server exposes 6 MCP tools. The outer LLM (Claude, GPT, etc.) orchestrates the loop by calling them in sequence:
| # | Tool | Purpose |
|---|------|---------|
| 1 | arsr_draft_response | Generate initial candidate answer (returns is_refusal flag) |
| 2 | arsr_decompose_claims | Split into atomic verifiable claims |
| 3 | arsr_score_uncertainty | Estimate confidence via semantic entropy |
| 4 | arsr_retrieve_evidence | Web search for low-confidence claims |
| 5 | arsr_revise_response | Rewrite draft with evidence |
| 6 | arsr_should_continue | Decide: iterate or finalize |
Inner LLM: Tools 1-5 use Claude Haiku internally for intelligence (query generation, claim extraction, evidence evaluation). This keeps costs low while the outer model handles orchestration.
Refusal detection: arsr_draft_response returns a structured is_refusal flag (classified by the inner LLM) indicating whether the draft is a non-answer. When is_refusal is true, downstream tools (decompose, revise) pivot to extracting claims from the original query and building an answer from retrieved evidence instead of trying to refine a refusal.
Web Search: arsr_retrieve_evidence uses the Anthropic API's built-in web search tool — no external search API keys needed.
Setup
Prerequisites
- Node.js 18+
- An Anthropic API key
Install & Build
cd arsr-mcp-server
npm install
npm run build
Environment
export ANTHROPIC_API_KEY="sk-ant-..."
Run
stdio mode (for Claude Desktop, Cursor, etc.):
npm start
HTTP mode (for remote access):
TRANSPORT=http PORT=3001 npm start
Claude Desktop Configuration
Add to your claude_desktop_config.json:
Npm:
{
"mcpServers": {
"arsr": {
"command": "npx",
"args": ["@jayarrowz/mcp-arsr"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-...",
"ARSR_MAX_ITERATIONS": "3",
"ARSR_ENTROPY_SAMPLES": "3",
"ARSR_RETRIEVAL_STRATEGY": "adversarial",
"ARSR_INNER_MODEL": "claude-haiku-4-5-20251001"
}
}
}
}
Local build:
{
"mcpServers": {
"arsr": {
"command": "node",
"args": ["/path/to/arsr-mcp-server/dist/src/index.js"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-...",
"ARSR_MAX_ITERATIONS": "3",
"ARSR_ENTROPY_SAMPLES": "3",
"ARSR_RETRIEVAL_STRATEGY": "adversarial",
"ARSR_INNER_MODEL": "claude-haiku-4-5-20251001"
}
}
}
}
How the outer LLM uses it
The orchestrating LLM calls the tools in sequence:
1. draft = arsr_draft_response({ query: "When was Tesla founded?" })
// draft.is_refusal indicates if the inner LLM refused to answer
2. claims = arsr_decompose_claims({ draft: draft.draft, original_query: "When was Tesla founded?", is_refusal: draft.is_refusal })
3. scored = arsr_score_uncertainty({ claims: claims.claims })
4. low = scored.scored.filter(c => c.confidence < 0.85)
5. evidence = arsr_retrieve_evidence({ claims_to_check: low })
6. revised = arsr_revise_response({ draft: draft.draft, evidence: evidence.evidence, scored: scored.scored, original_query: "When was Tesla founded?", is_refusal: draft.is_refusal })
7. decision = arsr_should_continue({ iteration: 1, scored: revised_scores })
→ if "continue": go to step 2 with revised text
→ if "stop": return revised.revised to user
Configuration
All settings can be overridden via environment variables, falling back to defaults if unset:
| Setting | Env var | Default | Description |
|---------|---------|---------|-------------|
| max_iterations | ARSR_MAX_ITERATIONS | 3 | Budget limit for refinement loops |
| confidence_threshold | ARSR_CONFIDENCE_THRESHOLD | 0.85 | Claims above this skip retrieval |
| entropy_samples | ARSR_ENTROPY_SAMPLES | 3 | Rephrasings for semantic entropy |
| retrieval_strategy | ARSR_RETRIEVAL_STRATEGY | adversarial | adversarial, confirmatory, or balanced |
| inner_model | ARSR_INNER_MODEL | claude-haiku-4-5-20251001 | Model for internal intelligence |
Cost estimate
Per refinement loop iteration (assuming ~5 claims, 3 low-confidence):
- Inner LLM calls: ~6-10 Haiku calls ≈ $0.002-0.005
- Web searches: 6-9 queries ≈ included in API
- Typical total for 2 iterations: < $0.02
Images
Before:
After:
License
MIT