E2E Testing Agent - User Workflows¶

What is this?¶

The E2E Testing Agent is an autonomous AI system that: 1. Analyzes your codebase to understand what needs testing 2. Generates test plans automatically 3. Executes UI, API, and database tests 4. Self-heals broken tests when selectors change 5. Reports results

You don't write tests - the AI does everything.

Quick Start (30 seconds)¶

# 1. Install
pip install -e .

# 2. Set API key
export ANTHROPIC_API_KEY=sk-ant-...

# 3. Run (your app must be running)
e2e-agent --codebase ./my-app --app-url http://localhost:3000

That's it. The agent will: - Scan your code - Generate tests - Run them - Give you a report

User Workflows¶

Workflow 1: Full Automated Testing (Most Common)¶

Use case: "Test my entire application"

# Your app must be running first!
npm start  # or docker-compose up, etc.

# Run the agent
e2e-agent --codebase /path/to/your/app --app-url http://localhost:3000

What happens:

│ 1. ANALYZE CODE │ │ │ │ │ └──  ┌ │ │ │ │ │ └  ┌ │ │ │ │ │ │ └  ┌ │ │ │ │ │ └  ┌ │ 5. REPORT │ │ │ │ │ └

id=__span-2-1>┌─────────────────────────────────────────────────────────────┐ │ Agent reads your codebase, finds: │ - UI pages and components │ - API endpoints │ - Database models │ - Existing tests (to avoid duplication) │ ───────────────────────────────────────────────────────────┘ ▼ ─────────────────────────────────────────────────────────────┐ 2. PLAN TESTS │ Agent creates prioritized test plan: │ - Critical paths (login, checkout, etc.) │ - API endpoint validation │ - Database integrity checks │ ─────────────────────────────────────────────────────────────┘ ▼ ─────────────────────────────────────────────────────────────┐ 3. EXECUTE TESTS │ Agent runs each test: │ - Opens browser (Playwright) │ - Performs actions (click, type, etc.) │ - Takes screenshots │ - Validates assertions │ ─────────────────────────────────────────────────────────────┘ ▼ ─────────────────────────────────────────────────────────────┐ 4. SELF-HEAL (if tests fail) │ If a test fails: │ - Agent analyzes why │ - Fixes selector if it changed │ - Retries the test │ ─────────────────────────────────────────────────────────────┘ ▼ ─────────────────────────────────────────────────────────────┐ │ Agent outputs: │ - Summary (passed/failed/skipped) │ - Detailed results JSON │ - Screenshots of failures │ - Cost breakdown │ ─────────────────────────────────────────────────────────────┘

Output:

./test-results/
├── results.json        # Full test results
├── report.html         # Human-readable report
└── screenshots/        # Failure screenshots
    ├── login-failed.png
    └── checkout-error.png

Workflow 2: PR/CI Integration¶

Use case: "Test only what changed in this PR"

# In GitHub Actions or CI
e2e-agent \
  --codebase . \
  --app-url $PREVIEW_URL \
  --pr 123 \
  --changed-files src/auth/login.tsx src/api/users.ts

What happens: - Agent focuses only on files that changed - Generates targeted tests for affected functionality - Faster and cheaper than full run

GitHub Actions Example:

name: AI E2E Tests
on: [pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Start preview
        run: docker-compose up -d

      - name: Get changed files
        id: changes
        run: |
          echo "files=$(git diff --name-only origin/main)" >> $GITHUB_OUTPUT

      - name: Run AI Tests
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          pip install e2e-testing-agent
          e2e-agent \
            --codebase . \
            --app-url http://localhost:3000 \
            --pr ${{ github.event.number }} \
            --changed-files ${{ steps.changes.outputs.files }}

Workflow 3: Python API (Programmatic)¶

Use case: "Integrate into my existing test framework"

import asyncio
from e2e_testing_agent import TestingOrchestrator

async def main():
    # Initialize
    orchestrator = TestingOrchestrator(
        codebase_path="/path/to/my/app",
        app_url="http://localhost:3000",
    )

    # Run all tests
    results = await orchestrator.run()

    # Check results
    print(f"Passed: {results['passed_count']}")
    print(f"Failed: {results['failed_count']}")

    # Access detailed results
    for test_result in results['test_results']:
        if test_result['status'] == 'failed':
            print(f"FAILED: {test_result['name']}")
            print(f"  Error: {test_result['error_message']}")

asyncio.run(main())

Workflow 4: Single Test Execution¶

Use case: "Just run this one specific test"

orchestrator = TestingOrchestrator(
    codebase_path="/path/to/app",
    app_url="http://localhost:3000",
)

# Define a specific test
test_spec = {
    "id": "login-test",
    "name": "User Login Flow",
    "type": "ui",
    "steps": [
        {"action": "goto", "target": "/login"},
        {"action": "fill", "selector": "#email", "value": "[email protected]"},
        {"action": "fill", "selector": "#password", "value": "password123"},
        {"action": "click", "selector": "#submit"},
    ],
    "assertions": [
        {"type": "url_contains", "value": "/dashboard"},
        {"type": "element_visible", "selector": "#welcome-message"},
    ]
}

result = await orchestrator.run_single_test(test_spec)

Browser Automation Options¶

The agent supports multiple ways to control the browser:

Option 1: Playwright (Default) - RECOMMENDED¶

# This is automatic - no configuration needed
e2e-agent --codebase ./app --app-url http://localhost:3000

Pros: Fastest, most reliable, works in CI Cons: Can be detected as bot by some sites

Option 2: Chrome Extension (Real Browser)¶

Use when you need: - Your existing browser session (cookies, auth) - To avoid bot detection - To capture console logs

from src.tools import create_browser

# Start with extension (requires Chrome + extension installed)
browser = await create_browser("extension")
await browser.goto("https://example.com")

# Unique feature: capture console logs
logs = await browser.get_console_logs()

Setup: 1. Load extension/ folder in Chrome as unpacked extension 2. Keep Chrome open 3. Agent connects via WebSocket

Option 3: Computer Use (Visual AI)¶

Use when: - Selectors are unreliable - UI changes frequently - You want "human-like" testing

browser = await create_browser("computer_use")

# No selectors! Natural language descriptions
await browser.click("the blue Login button")
await browser.fill("the email input field", "[email protected]")

Pros: Works with any UI, no selectors needed Cons: Slower (2-5s per action), higher API cost

Option 4: Hybrid (Best of Both)¶

browser = await create_browser("hybrid")

# Tries Playwright first (fast)
# Falls back to Computer Use if selector fails
await browser.click("#login-btn")  # Playwright tries first
# If #login-btn doesn't exist, Computer Use finds it visually

Configuration¶

Environment Variables¶

# Required
ANTHROPIC_API_KEY=sk-ant-...

# Models (optional)
DEFAULT_MODEL=claude-sonnet-4-5         # Main testing model
VERIFICATION_MODEL=claude-haiku-4-5     # Quick checks
DEBUGGING_MODEL=claude-opus-4-5         # Complex analysis

# Cost controls (optional)
COST_LIMIT_PER_RUN=10.00     # Max $ per full run
COST_LIMIT_PER_TEST=1.00     # Max $ per test
MAX_ITERATIONS=50            # Safety limit

# Self-healing (optional)
SELF_HEAL_ENABLED=true
SELF_HEAL_MAX_ATTEMPTS=3

Config File (.env)¶

# .env
ANTHROPIC_API_KEY=sk-ant-xxx
DEFAULT_MODEL=claude-sonnet-4-5
COST_LIMIT_PER_RUN=15.00
SELF_HEAL_ENABLED=true

Cost Estimates¶

Action	Approx Cost
Analyze small codebase (< 50 files)	$0.10-0.30
Analyze large codebase (500+ files)	$0.50-2.00
Generate test plan	$0.05-0.20
Run 1 UI test (10 steps)	$0.10-0.20
Run 1 API test	$0.02-0.05
Self-heal 1 failure	$0.05-0.15
Generate report	$0.02-0.05

Typical full run: $1-5 depending on app size

Architecture Overview¶

┌─────────────────────────────────────────────────────────────────────┐
│                            USER                                      │
│                  e2e-agent --codebase ./app                         │
└───────────────────────────────┬─────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      TESTING ORCHESTRATOR                            │
│                        (LangGraph FSM)                              │
│                                                                      │
│   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐        │
│   │ Analyze  │ → │  Plan    │ → │ Execute  │ → │  Report  │        │
│   │   Code   │   │  Tests   │   │  Tests   │   │          │        │
│   └──────────┘   └──────────┘   └────┬─────┘   └──────────┘        │
│                                      │                              │
│                                      ▼                              │
│                               ┌──────────┐                          │
│                               │Self-Heal │ (on failure)             │
│                               └──────────┘                          │
└───────────────────────────────┬─────────────────────────────────────┘
                                │
                    ┌───────────┼───────────┐
                    │           │           │
                    ▼           ▼           ▼
             ┌──────────┐ ┌──────────┐ ┌──────────┐
             │UI Tester │ │API Tester│ │DB Tester │
             └────┬─────┘ └────┬─────┘ └────┬─────┘
                  │            │            │
                  ▼            ▼            ▼
             ┌──────────┐ ┌──────────┐ ┌──────────┐
             │Browser   │ │  httpx   │ │SQLAlchemy│
             │Automation│ │          │ │          │
             └──────────┘ └──────────┘ └──────────┘

Troubleshooting¶

"No testable surfaces found"¶

Make sure your app has recognizable patterns (routes, API endpoints, etc.)
Check that codebase path is correct

Tests timing out¶

Increase timeout: TIMEOUT_MS=60000
Make sure app is fully loaded before testing starts

High costs¶

Use --changed-files for targeted testing
Lower COST_LIMIT_PER_RUN
Use Haiku for verification: VERIFICATION_MODEL=claude-haiku-4-5

Bot detection issues¶

Use Chrome Extension instead of Playwright
Or use Computer Use mode

Summary: When to Use What¶

Scenario	Recommendation
CI/CD pipeline	Default (Playwright)
Testing authenticated app	Chrome Extension
Flaky selectors	Hybrid mode
Any website/no selectors	Computer Use
Quick PR checks	`--changed-files` flag
Full regression	Full run, maybe overnight