Automation Testing Agent with Playwright MCP

An AI-powered testing agent that performs manual exploratory testing and generates automation scripts in any language by combining an LLM brain with the Playwright MCP server for browser control.

What This Agent Does
Architecture & Design
How It Works
Prerequisites
Installation & Setup
Configuration
Running the Agent
Usage Examples
Project Structure
Extending the Agent
CI/CD Integration
Troubleshooting

What This Agent Does

The agent operates in two modes:

Manual Testing Mode — You describe a test scenario in plain English. The agent drives a real browser, observes page state, executes each step, captures screenshots, and produces a pass/fail report with evidence. Perfect for bug reproduction, smoke testing, and ad-hoc verification.

Script Generation Mode — You describe a flow and specify a target language (Python, TypeScript, Java, C#). The agent first executes the flow to validate it actually works, then emits a clean, production-ready Playwright automation script. No more hand-writing locators that break on the first run.

Architecture & Design

The system has three layers that are cleanly decoupled so each can be swapped independently:

┌────────────────────────────────────────────────────┐
│  Layer 1: LLM Brain (Claude / GPT-4 / Gemini)      │
│  - Reasons about test scenarios                    │
│  - Decides which browser actions to take           │
│  - Generates test reports and code                 │
└──────────────────────┬─────────────────────────────┘
                       │ Tool calls
                       ▼
┌────────────────────────────────────────────────────┐
│  Layer 2: Orchestration (LangGraph ReAct Agent)    │
│  - Manages conversation state                      │
│  - Routes between manual test / script gen modes   │
│  - Handles retries and error recovery              │
│  - Persists session state                          │
└──────────────────────┬─────────────────────────────┘
                       │ MCP protocol (stdio / SSE)
                       ▼
┌────────────────────────────────────────────────────┐
│  Layer 3: Playwright MCP Server                    │
│  - Exposes browser control as MCP tools            │
│  - Runs Chromium / Firefox / WebKit                │
│  - Returns accessibility snapshots, screenshots,   │
│    console logs, network requests                  │
└────────────────────────────────────────────────────┘

Design Decisions

Why MCP instead of direct Playwright SDK? MCP gives us a standard protocol so the same agent brain works with any MCP-compliant tool server (databases, APIs, filesystems) without rewriting glue code. Adding new capabilities means connecting another MCP server.

Why LangGraph? Testing flows are state machines — navigate, wait, assert, recover. LangGraph's graph-based orchestration handles branching (retry on flake, escalate on real failure) better than a flat ReAct loop.

Why accessibility-tree snapshots over screenshots for decisions? The accessibility tree gives the agent structured, text-based page state that's cheap to reason about. Screenshots are captured as evidence, not as the primary input for decisions — this keeps token usage sane and improves reliability.

Why generate scripts AFTER executing? LLMs hallucinate locators and API calls. By executing the flow first via MCP and recording what actually worked, the generated script uses real selectors that match the live DOM, not invented ones.

How It Works

Manual Testing Mode — Step by Step

User submits a scenario like "Test login on example.com with tomsmith / SuperSecret".
Agent parses the scenario into atomic steps (navigate, type, click, assert).
For each step:
- Call browser_snapshot to get the accessibility tree
- Identify the right element reference
- Execute the action (browser_click, browser_type, etc.)
- Capture a screenshot as evidence
- Verify the expected outcome via another snapshot
On any failure, dump console logs and network requests for diagnosis.
Produce a structured test report with pass/fail, timings, and screenshots.

Script Generation Mode — Step by Step

Agent runs the full manual test flow first (Mode 1).
It records the exact sequence of tool calls that succeeded.
It loads a language-specific template (Python pytest, TypeScript Playwright Test, Java JUnit, C# NUnit).
It maps each MCP tool call to the equivalent native Playwright API call.
It injects real locators (from the accessibility snapshots) into the template.
It adds proper waits, assertions, setup/teardown, and comments.
It writes the file to generated_scripts/ and validates it runs.

Prerequisites

Node.js 18+ (for Playwright MCP server)
Python 3.10+ (for the agent orchestrator)
An LLM API key (Anthropic Claude recommended; OpenAI also supported)
~2 GB disk space (Chromium browser binaries)
macOS, Linux, or Windows with WSL2

Installation & Setup

1. Clone and enter the project

git clone <your-repo-url> testing-agent
cd testing-agent

2. Install Playwright MCP server globally

npm install -g @playwright/mcp@latest
npx playwright install chromium

Verify it works:

npx @playwright/mcp@latest --port 8931
# You should see: "Playwright MCP listening on port 8931"
# Press Ctrl+C to stop

3. Set up Python environment

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt

4. Configure environment variables

Create a .env file in the project root:

ANTHROPIC_API_KEY=sk-ant-your-key-here
# OR, if using OpenAI:
# OPENAI_API_KEY=sk-your-key-here

# Optional tuning
AGENT_MODEL=claude-opus-4-7
AGENT_HEADLESS=true
AGENT_MAX_RETRIES=3
AGENT_SCREENSHOT_DIR=./screenshots

5. Verify installation

python src/agent.py --health-check
# Expected output: "Agent ready. MCP server connected. X tools available."

Configuration

`mcp_config.json` — MCP server definitions

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest", "--headless"],
      "transport": "stdio"
    }
  }
}

Swap --headless for --headed if you want to watch the browser work. Add --browser=firefox to target Firefox instead of Chromium.

Agent prompts — `prompts/` directory

manual_tester.md — system prompt for manual testing mode
script_generator.md — system prompt for script generation mode
templates/ — language-specific code scaffolding

Edit these to customize the agent's style, strictness, and reporting format.

Running the Agent

Interactive mode (REPL)

python src/agent.py

You'll get a prompt where you can type scenarios and iterate.

One-shot manual test

python src/agent.py --scenario "Go to example.com and verify the page title contains 'Example Domain'"

Script generation

python src/agent.py \
  --scenario "Log in to https://the-internet.herokuapp.com/login with tomsmith / SuperSecretPassword! and verify success" \
  --generate python \
  --output generated_scripts/test_login.py

Supported languages: python, typescript, javascript, java, csharp.

Batch mode — run scenarios from a YAML file

python src/agent.py --batch test_suite.yaml

Example test_suite.yaml:

suite_name: Smoke Tests
scenarios:
  - name: Home page loads
    steps: Navigate to example.com and verify h1 is visible
  - name: Login works
    steps: Log in with tomsmith / SuperSecretPassword! and verify redirect
    generate_script: typescript

Usage Examples

Example 1 — Bug reproduction

> Go to our staging site at https://staging.myapp.com, 
> try to submit the contact form with an empty email field,
> and tell me if the validation error shows up.

Agent executes, captures screenshots of the form state before and after, reports whether validation fired, and includes console errors if any.

Example 2 — Generate a Python Playwright test

python src/agent.py \
  --scenario "Navigate to github.com/login, enter username 'testuser', \
              click Sign in, verify the password field appears" \
  --generate python

Output in generated_scripts/test_github_login.py:

import pytest
from playwright.sync_api import Page, expect

def test_github_login_flow(page: Page):
    page.goto("https://github.com/login")
    page.get_by_label("Username or email address").fill("testuser")
    page.get_by_role("button", name="Sign in").click()
    expect(page.get_by_label("Password")).to_be_visible()

Example 3 — Exploratory testing session

> Explore the checkout flow on our site. Try to break it:
> invalid coupon codes, out-of-stock items, expired cards.
> Report anything that looks broken.

Agent runs multiple sub-scenarios, reports findings with reproduction steps.

Project Structure

testing-agent/
├── README.md                    # This file
├── ARCHITECTURE.md              # Deep dive into design decisions
├── requirements.txt             # Python dependencies
├── mcp_config.json              # MCP server configuration
├── .env.example                 # Environment variable template
├── src/
│   ├── agent.py                 # Main entry point
│   ├── modes.py                 # Manual vs script-gen mode routing
│   ├── script_generator.py      # Language-specific code emission
│   ├── reporter.py              # Test report formatting
│   └── retry.py                 # Flaky-test retry logic
├── prompts/
│   ├── manual_tester.md
│   ├── script_generator.md
│   └── templates/
│       ├── python.py.tmpl
│       ├── typescript.ts.tmpl
│       ├── java.java.tmpl
│       └── csharp.cs.tmpl
├── generated_scripts/           # Output directory for generated code
├── screenshots/                 # Test evidence
└── reports/                     # JSON test reports

Extending the Agent

Adding a new target language

Create prompts/templates/<lang>.<ext>.tmpl with scaffolding.
Add the language to the LANGUAGE_MAP in src/script_generator.py.
Update the system prompt in prompts/script_generator.md to include the new language's Playwright API conventions.

Adding new MCP servers

Append to mcp_config.json:

"database": {
  "command": "npx",
  "args": ["@modelcontextprotocol/server-postgres", "postgresql://..."],
  "transport": "stdio"
}

Now the agent can query a database during tests (useful for verifying backend state after UI actions).

Custom test reporters

Implement the Reporter interface in src/reporter.py. Built-in options: ConsoleReporter, JSONReporter, JUnitXMLReporter, AllureReporter.

CI/CD Integration

GitHub Actions example

name: AI-Driven E2E Tests

on: [pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - uses: actions/setup-python@v5
        with: { python-version: "3.11" }

      - name: Install Playwright MCP
        run: |
          npm install -g @playwright/mcp@latest
          npx playwright install --with-deps chromium

      - name: Install agent
        run: pip install -r requirements.txt

      - name: Run test suite
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: python src/agent.py --batch tests/smoke.yaml --report junit

      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: test-evidence
          path: |
            screenshots/
            reports/

Caching tips

Cache the ~/.cache/ms-playwright directory to skip the 90-second browser download on every CI run.

Troubleshooting

"MCP server failed to start" — Run npx @playwright/mcp@latest --port 8931 manually to see the actual error. Usually means Chromium isn't installed; run npx playwright install chromium.

"Agent loops forever on a flaky element" — Lower AGENT_MAX_RETRIES in .env and add explicit waits to your scenario description ("wait for the loading spinner to disappear before clicking").

"Generated script doesn't run" — The agent validates scripts by attempting a dry-run. If this is failing, check reports/latest.json for the validation error. Most common cause: Playwright version mismatch between the MCP server and your local Playwright install.

"Token usage is too high" — Enable accessibility-tree-only mode (no screenshots) in mcp_config.json by adding "--no-screenshots" to the args array. Screenshots are still captured on failure.

Browser hangs on auth prompts — Add --ignore-https-errors and --bypass-csp to the MCP server args for test environments with self-signed certs.

License

MIT

Contributing

PRs welcome. Please include a test scenario that exercises any new agent behavior.