Skip to content

E2E Testing Agent - User Workflows

What is this?

The E2E Testing Agent is an autonomous AI system that: 1. Analyzes your codebase to understand what needs testing 2. Generates test plans automatically 3. Executes UI, API, and database tests 4. Self-heals broken tests when selectors change 5. Reports results

You don't write tests - the AI does everything.

Quick Start (30 seconds)

# 1. Install
pip install -e .

# 2. Set API key
export ANTHROPIC_API_KEY=sk-ant-...

# 3. Run (your app must be running)
e2e-agent --codebase ./my-app --app-url http://localhost:3000

That's it. The agent will: - Scan your code - Generate tests - Run them - Give you a report


User Workflows

Workflow 1: Full Automated Testing (Most Common)

Use case: "Test my entire application"

# Your app must be running first!
npm start  # or docker-compose up, etc.

# Run the agent
e2e-agent --codebase /path/to/your/app --app-url http://localhost:3000

What happens:

┌─────────────────────────────────────────────────────────────┐
│ 1. ANALYZE CODE                                              │
│    Agent reads your codebase, finds:                        │
│    - UI pages and components                                │
│    - API endpoints                                          │
│    - Database models                                        │
│    - Existing tests (to avoid duplication)                  │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 2. PLAN TESTS                                               │
│    Agent creates prioritized test plan:                     │
│    - Critical paths (login, checkout, etc.)                 │
│    - API endpoint validation                                │
│    - Database integrity checks                              │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 3. EXECUTE TESTS                                            │
│    Agent runs each test:                                    │
│    - Opens browser (Playwright)                             │
│    - Performs actions (click, type, etc.)                   │
│    - Takes screenshots                                      │
│    - Validates assertions                                   │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 4. SELF-HEAL (if tests fail)                                │
│    If a test fails:                                         │
│    - Agent analyzes why                                     │
│    - Fixes selector if it changed                           │
│    - Retries the test                                       │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 5. REPORT                                                   │
│    Agent outputs:                                           │
│    - Summary (passed/failed/skipped)                        │
│    - Detailed results JSON                                  │
│    - Screenshots of failures                                │
│    - Cost breakdown                                         │
└─────────────────────────────────────────────────────────────┘

Output:

./test-results/
├── results.json        # Full test results
├── report.html         # Human-readable report
└── screenshots/        # Failure screenshots
    ├── login-failed.png
    └── checkout-error.png


Workflow 2: PR/CI Integration

Use case: "Test only what changed in this PR"

# In GitHub Actions or CI
e2e-agent \
  --codebase . \
  --app-url $PREVIEW_URL \
  --pr 123 \
  --changed-files src/auth/login.tsx src/api/users.ts

What happens: - Agent focuses only on files that changed - Generates targeted tests for affected functionality - Faster and cheaper than full run

GitHub Actions Example:

name: AI E2E Tests
on: [pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Start preview
        run: docker-compose up -d

      - name: Get changed files
        id: changes
        run: |
          echo "files=$(git diff --name-only origin/main)" >> $GITHUB_OUTPUT

      - name: Run AI Tests
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          pip install e2e-testing-agent
          e2e-agent \
            --codebase . \
            --app-url http://localhost:3000 \
            --pr ${{ github.event.number }} \
            --changed-files ${{ steps.changes.outputs.files }}


Workflow 3: Python API (Programmatic)

Use case: "Integrate into my existing test framework"

import asyncio
from e2e_testing_agent import TestingOrchestrator

async def main():
    # Initialize
    orchestrator = TestingOrchestrator(
        codebase_path="/path/to/my/app",
        app_url="http://localhost:3000",
    )

    # Run all tests
    results = await orchestrator.run()

    # Check results
    print(f"Passed: {results['passed_count']}")
    print(f"Failed: {results['failed_count']}")

    # Access detailed results
    for test_result in results['test_results']:
        if test_result['status'] == 'failed':
            print(f"FAILED: {test_result['name']}")
            print(f"  Error: {test_result['error_message']}")

asyncio.run(main())

Workflow 4: Single Test Execution

Use case: "Just run this one specific test"

orchestrator = TestingOrchestrator(
    codebase_path="/path/to/app",
    app_url="http://localhost:3000",
)

# Define a specific test
test_spec = {
    "id": "login-test",
    "name": "User Login Flow",
    "type": "ui",
    "steps": [
        {"action": "goto", "target": "/login"},
        {"action": "fill", "selector": "#email", "value": "[email protected]"},
        {"action": "fill", "selector": "#password", "value": "password123"},
        {"action": "click", "selector": "#submit"},
    ],
    "assertions": [
        {"type": "url_contains", "value": "/dashboard"},
        {"type": "element_visible", "selector": "#welcome-message"},
    ]
}

result = await orchestrator.run_single_test(test_spec)

Browser Automation Options

The agent supports multiple ways to control the browser:

# This is automatic - no configuration needed
e2e-agent --codebase ./app --app-url http://localhost:3000

Pros: Fastest, most reliable, works in CI Cons: Can be detected as bot by some sites

Option 2: Chrome Extension (Real Browser)

Use when you need: - Your existing browser session (cookies, auth) - To avoid bot detection - To capture console logs

from src.tools import create_browser

# Start with extension (requires Chrome + extension installed)
browser = await create_browser("extension")
await browser.goto("https://example.com")

# Unique feature: capture console logs
logs = await browser.get_console_logs()

Setup: 1. Load extension/ folder in Chrome as unpacked extension 2. Keep Chrome open 3. Agent connects via WebSocket

Option 3: Computer Use (Visual AI)

Use when: - Selectors are unreliable - UI changes frequently - You want "human-like" testing

browser = await create_browser("computer_use")

# No selectors! Natural language descriptions
await browser.click("the blue Login button")
await browser.fill("the email input field", "[email protected]")

Pros: Works with any UI, no selectors needed Cons: Slower (2-5s per action), higher API cost

Option 4: Hybrid (Best of Both)

browser = await create_browser("hybrid")

# Tries Playwright first (fast)
# Falls back to Computer Use if selector fails
await browser.click("#login-btn")  # Playwright tries first
# If #login-btn doesn't exist, Computer Use finds it visually

Configuration

Environment Variables

# Required
ANTHROPIC_API_KEY=sk-ant-...

# Models (optional)
DEFAULT_MODEL=claude-sonnet-4-5         # Main testing model
VERIFICATION_MODEL=claude-haiku-4-5     # Quick checks
DEBUGGING_MODEL=claude-opus-4-5         # Complex analysis

# Cost controls (optional)
COST_LIMIT_PER_RUN=10.00     # Max $ per full run
COST_LIMIT_PER_TEST=1.00     # Max $ per test
MAX_ITERATIONS=50            # Safety limit

# Self-healing (optional)
SELF_HEAL_ENABLED=true
SELF_HEAL_MAX_ATTEMPTS=3

Config File (.env)

# .env
ANTHROPIC_API_KEY=sk-ant-xxx
DEFAULT_MODEL=claude-sonnet-4-5
COST_LIMIT_PER_RUN=15.00
SELF_HEAL_ENABLED=true

Cost Estimates

Action Approx Cost
Analyze small codebase (< 50 files) $0.10-0.30
Analyze large codebase (500+ files) $0.50-2.00
Generate test plan $0.05-0.20
Run 1 UI test (10 steps) $0.10-0.20
Run 1 API test $0.02-0.05
Self-heal 1 failure $0.05-0.15
Generate report $0.02-0.05

Typical full run: $1-5 depending on app size


Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                            USER                                      │
│                  e2e-agent --codebase ./app                         │
└───────────────────────────────┬─────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│                      TESTING ORCHESTRATOR                            │
│                        (LangGraph FSM)                              │
│                                                                      │
│   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐        │
│   │ Analyze  │ → │  Plan    │ → │ Execute  │ → │  Report  │        │
│   │   Code   │   │  Tests   │   │  Tests   │   │          │        │
│   └──────────┘   └──────────┘   └────┬─────┘   └──────────┘        │
│                                      │                              │
│                                      ▼                              │
│                               ┌──────────┐                          │
│                               │Self-Heal │ (on failure)             │
│                               └──────────┘                          │
└───────────────────────────────┬─────────────────────────────────────┘
                    ┌───────────┼───────────┐
                    │           │           │
                    ▼           ▼           ▼
             ┌──────────┐ ┌──────────┐ ┌──────────┐
             │UI Tester │ │API Tester│ │DB Tester │
             └────┬─────┘ └────┬─────┘ └────┬─────┘
                  │            │            │
                  ▼            ▼            ▼
             ┌──────────┐ ┌──────────┐ ┌──────────┐
             │Browser   │ │  httpx   │ │SQLAlchemy│
             │Automation│ │          │ │          │
             └──────────┘ └──────────┘ └──────────┘

Troubleshooting

"No testable surfaces found"

  • Make sure your app has recognizable patterns (routes, API endpoints, etc.)
  • Check that codebase path is correct

Tests timing out

  • Increase timeout: TIMEOUT_MS=60000
  • Make sure app is fully loaded before testing starts

High costs

  • Use --changed-files for targeted testing
  • Lower COST_LIMIT_PER_RUN
  • Use Haiku for verification: VERIFICATION_MODEL=claude-haiku-4-5

Bot detection issues

  • Use Chrome Extension instead of Playwright
  • Or use Computer Use mode

Summary: When to Use What

Scenario Recommendation
CI/CD pipeline Default (Playwright)
Testing authenticated app Chrome Extension
Flaky selectors Hybrid mode
Any website/no selectors Computer Use
Quick PR checks --changed-files flag
Full regression Full run, maybe overnight