Automation Testing Agent with Playwright MCP An AI-powered testing agent that performs manual exploratory testing and generates automation scripts in any language by combining an LLM brain with the Playwright MCP server for browser control.
Automation Testing Agent with Playwright MCP
An AI-powered testing agent that performs manual exploratory testing and generates automation scripts in any language by combining an LLM brain with the Playwright MCP server for browser control.
Table of Contents
- What This Agent Does
- Architecture & Design
- How It Works
- Prerequisites
- Installation & Setup
- Configuration
- Running the Agent
- Usage Examples
- Project Structure
- Extending the Agent
- CI/CD Integration
- Troubleshooting
What This Agent Does
The agent operates in two modes:
Manual Testing Mode — You describe a test scenario in plain English. The agent drives a real browser, observes page state, executes each step, captures screenshots, and produces a pass/fail report with evidence. Perfect for bug reproduction, smoke testing, and ad-hoc verification.
Script Generation Mode — You describe a flow and specify a target language (Python, TypeScript, Java, C#). The agent first executes the flow to validate it actually works, then emits a clean, production-ready Playwright automation script. No more hand-writing locators that break on the first run.
Architecture & Design
The system has three layers that are cleanly decoupled so each can be swapped independently:
┌────────────────────────────────────────────────────┐
│ Layer 1: LLM Brain (Claude / GPT-4 / Gemini) │
│ - Reasons about test scenarios │
│ - Decides which browser actions to take │
│ - Generates test reports and code │
└──────────────────────┬─────────────────────────────┘
│ Tool calls
▼
┌────────────────────────────────────────────────────┐
│ Layer 2: Orchestration (LangGraph ReAct Agent) │
│ - Manages conversation state │
│ - Routes between manual test / script gen modes │
│ - Handles retries and error recovery │
│ - Persists session state │
└──────────────────────┬─────────────────────────────┘
│ MCP protocol (stdio / SSE)
▼
┌────────────────────────────────────────────────────┐
│ Layer 3: Playwright MCP Server │
│ - Exposes browser control as MCP tools │
│ - Runs Chromium / Firefox / WebKit │
│ - Returns accessibility snapshots, screenshots, │
│ console logs, network requests │
└────────────────────────────────────────────────────┘
Design Decisions
Why MCP instead of direct Playwright SDK? MCP gives us a standard protocol so the same agent brain works with any MCP-compliant tool server (databases, APIs, filesystems) without rewriting glue code. Adding new capabilities means connecting another MCP server.
Why LangGraph? Testing flows are state machines — navigate, wait, assert, recover. LangGraph's graph-based orchestration handles branching (retry on flake, escalate on real failure) better than a flat ReAct loop.
Why accessibility-tree snapshots over screenshots for decisions? The accessibility tree gives the agent structured, text-based page state that's cheap to reason about. Screenshots are captured as evidence, not as the primary input for decisions — this keeps token usage sane and improves reliability.
Why generate scripts AFTER executing? LLMs hallucinate locators and API calls. By executing the flow first via MCP and recording what actually worked, the generated script uses real selectors that match the live DOM, not invented ones.
How It Works
Manual Testing Mode — Step by Step
- User submits a scenario like "Test login on example.com with tomsmith / SuperSecret".
- Agent parses the scenario into atomic steps (navigate, type, click, assert).
- For each step:
- Call
browser_snapshotto get the accessibility tree - Identify the right element reference
- Execute the action (
browser_click,browser_type, etc.) - Capture a screenshot as evidence
- Verify the expected outcome via another snapshot
- Call
- On any failure, dump console logs and network requests for diagnosis.
- Produce a structured test report with pass/fail, timings, and screenshots.
Script Generation Mode — Step by Step
- Agent runs the full manual test flow first (Mode 1).
- It records the exact sequence of tool calls that succeeded.
- It loads a language-specific template (Python pytest, TypeScript Playwright Test, Java JUnit, C# NUnit).
- It maps each MCP tool call to the equivalent native Playwright API call.
- It injects real locators (from the accessibility snapshots) into the template.
- It adds proper waits, assertions, setup/teardown, and comments.
- It writes the file to
generated_scripts/and validates it runs.
Prerequisites
- Node.js 18+ (for Playwright MCP server)
- Python 3.10+ (for the agent orchestrator)
- An LLM API key (Anthropic Claude recommended; OpenAI also supported)
- ~2 GB disk space (Chromium browser binaries)
- macOS, Linux, or Windows with WSL2
Installation & Setup
1. Clone and enter the project
git clone <your-repo-url> testing-agent
cd testing-agent
2. Install Playwright MCP server globally
npm install -g @playwright/mcp@latest
npx playwright install chromium
Verify it works:
npx @playwright/mcp@latest --port 8931
# You should see: "Playwright MCP listening on port 8931"
# Press Ctrl+C to stop
3. Set up Python environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
4. Configure environment variables
Create a .env file in the project root:
ANTHROPIC_API_KEY=sk-ant-your-key-here
# OR, if using OpenAI:
# OPENAI_API_KEY=sk-your-key-here
# Optional tuning
AGENT_MODEL=claude-opus-4-7
AGENT_HEADLESS=true
AGENT_MAX_RETRIES=3
AGENT_SCREENSHOT_DIR=./screenshots
5. Verify installation
python src/agent.py --health-check
# Expected output: "Agent ready. MCP server connected. X tools available."
Configuration
mcp_config.json — MCP server definitions
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest", "--headless"],
"transport": "stdio"
}
}
}
Swap --headless for --headed if you want to watch the browser work. Add --browser=firefox to target Firefox instead of Chromium.
Agent prompts — prompts/ directory
manual_tester.md— system prompt for manual testing modescript_generator.md— system prompt for script generation modetemplates/— language-specific code scaffolding
Edit these to customize the agent's style, strictness, and reporting format.
Running the Agent
Interactive mode (REPL)
python src/agent.py
You'll get a prompt where you can type scenarios and iterate.
One-shot manual test
python src/agent.py --scenario "Go to example.com and verify the page title contains 'Example Domain'"
Script generation
python src/agent.py \
--scenario "Log in to https://the-internet.herokuapp.com/login with tomsmith / SuperSecretPassword! and verify success" \
--generate python \
--output generated_scripts/test_login.py
Supported languages: python, typescript, javascript, java, csharp.
Batch mode — run scenarios from a YAML file
python src/agent.py --batch test_suite.yaml
Example test_suite.yaml:
suite_name: Smoke Tests
scenarios:
- name: Home page loads
steps: Navigate to example.com and verify h1 is visible
- name: Login works
steps: Log in with tomsmith / SuperSecretPassword! and verify redirect
generate_script: typescript
Usage Examples
Example 1 — Bug reproduction
> Go to our staging site at https://staging.myapp.com,
> try to submit the contact form with an empty email field,
> and tell me if the validation error shows up.
Agent executes, captures screenshots of the form state before and after, reports whether validation fired, and includes console errors if any.
Example 2 — Generate a Python Playwright test
python src/agent.py \
--scenario "Navigate to github.com/login, enter username 'testuser', \
click Sign in, verify the password field appears" \
--generate python
Output in generated_scripts/test_github_login.py:
import pytest
from playwright.sync_api import Page, expect
def test_github_login_flow(page: Page):
page.goto("https://github.com/login")
page.get_by_label("Username or email address").fill("testuser")
page.get_by_role("button", name="Sign in").click()
expect(page.get_by_label("Password")).to_be_visible()
Example 3 — Exploratory testing session
> Explore the checkout flow on our site. Try to break it:
> invalid coupon codes, out-of-stock items, expired cards.
> Report anything that looks broken.
Agent runs multiple sub-scenarios, reports findings with reproduction steps.
Project Structure
testing-agent/
├── README.md # This file
├── ARCHITECTURE.md # Deep dive into design decisions
├── requirements.txt # Python dependencies
├── mcp_config.json # MCP server configuration
├── .env.example # Environment variable template
├── src/
│ ├── agent.py # Main entry point
│ ├── modes.py # Manual vs script-gen mode routing
│ ├── script_generator.py # Language-specific code emission
│ ├── reporter.py # Test report formatting
│ └── retry.py # Flaky-test retry logic
├── prompts/
│ ├── manual_tester.md
│ ├── script_generator.md
│ └── templates/
│ ├── python.py.tmpl
│ ├── typescript.ts.tmpl
│ ├── java.java.tmpl
│ └── csharp.cs.tmpl
├── generated_scripts/ # Output directory for generated code
├── screenshots/ # Test evidence
└── reports/ # JSON test reports
Extending the Agent
Adding a new target language
- Create
prompts/templates/<lang>.<ext>.tmplwith scaffolding. - Add the language to the
LANGUAGE_MAPinsrc/script_generator.py. - Update the system prompt in
prompts/script_generator.mdto include the new language's Playwright API conventions.
Adding new MCP servers
Append to mcp_config.json:
"database": {
"command": "npx",
"args": ["@modelcontextprotocol/server-postgres", "postgresql://..."],
"transport": "stdio"
}
Now the agent can query a database during tests (useful for verifying backend state after UI actions).
Custom test reporters
Implement the Reporter interface in src/reporter.py. Built-in options: ConsoleReporter, JSONReporter, JUnitXMLReporter, AllureReporter.
CI/CD Integration
GitHub Actions example
name: AI-Driven E2E Tests
on: [pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: 20 }
- uses: actions/setup-python@v5
with: { python-version: "3.11" }
- name: Install Playwright MCP
run: |
npm install -g @playwright/mcp@latest
npx playwright install --with-deps chromium
- name: Install agent
run: pip install -r requirements.txt
- name: Run test suite
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: python src/agent.py --batch tests/smoke.yaml --report junit
- uses: actions/upload-artifact@v4
if: always()
with:
name: test-evidence
path: |
screenshots/
reports/
Caching tips
Cache the ~/.cache/ms-playwright directory to skip the 90-second browser download on every CI run.
Troubleshooting
"MCP server failed to start" — Run npx @playwright/mcp@latest --port 8931 manually to see the actual error. Usually means Chromium isn't installed; run npx playwright install chromium.
"Agent loops forever on a flaky element" — Lower AGENT_MAX_RETRIES in .env and add explicit waits to your scenario description ("wait for the loading spinner to disappear before clicking").
"Generated script doesn't run" — The agent validates scripts by attempting a dry-run. If this is failing, check reports/latest.json for the validation error. Most common cause: Playwright version mismatch between the MCP server and your local Playwright install.
"Token usage is too high" — Enable accessibility-tree-only mode (no screenshots) in mcp_config.json by adding "--no-screenshots" to the args array. Screenshots are still captured on failure.
Browser hangs on auth prompts — Add --ignore-https-errors and --bypass-csp to the MCP server args for test environments with self-signed certs.
License
MIT
Contributing
PRs welcome. Please include a test scenario that exercises any new agent behavior.