Skip to content

Argus Agent Catalog

Version: 1.0.0 Last Updated: 2026-01-27T17:30:00Z Document Status: Production Ready - Verified Against Codebase Source Files: src/agents/*.py (20+ specialized agents, ~450KB total)


Overview

Argus employs 20+ specialized AI agents orchestrated via LangGraph 1.0. Each agent has a specific responsibility in the testing lifecycle, from code analysis to self-healing.

graph TB
    subgraph "Orchestration Layer"
        LG["LangGraph Supervisor<br/>src/orchestrator/supervisor.py"]
    end

    subgraph "Analysis Agents"
        CA["CodeAnalyzerAgent<br/>11KB"]
        TP["TestPlannerAgent<br/>19KB"]
        AD["AutoDiscoveryAgent<br/>24KB"]
        TIA["TestImpactAnalyzer<br/>19KB"]
    end

    subgraph "Execution Agents"
        UI["UITesterAgent<br/>30KB"]
        API["APITesterAgent<br/>14KB"]
        DB["DBTesterAgent<br/>17KB"]
    end

    subgraph "Intelligence Agents"
        SH["SelfHealerAgent<br/>74KB"]
        VA["VisualAIAgent<br/>20KB"]
        RCA["RootCauseAnalyzer<br/>17KB"]
        FD["FlakyDetector<br/>17KB"]
    end

    subgraph "Quality Agents"
        QA["QualityAuditor<br/>24KB"]
        AC["AccessibilityChecker<br/>18KB"]
        SS["SecurityScanner<br/>13KB"]
        PA["PerformanceAnalyzer<br/>9KB"]
    end

    subgraph "Utility Agents"
        RA["RouterAgent<br/>19KB"]
        REP["ReporterAgent<br/>17KB"]
        NLP["NLPTestCreator<br/>18KB"]
        S2T["SessionToTest<br/>21KB"]
    end

    LG --> CA
    LG --> TP
    LG --> UI
    LG --> SH
    LG --> REP

    CA --> TP
    TP --> UI
    TP --> API
    TP --> DB
    UI --> SH
    API --> SH
    SH --> REP

Agent Responsibility Matrix

Agent Purpose Input Output Model
CodeAnalyzer Find test surfaces Codebase + URLs Testable surfaces Haiku/GPT-4o
TestPlanner Create test specs Surfaces Test specs Sonnet
UITester Execute UI tests Test specs Pass/fail + screenshots Sonnet (vision)
APITester Execute API tests Test specs Response + validation Haiku
DBTester Validate data Test specs Query results Haiku
SelfHealer Fix failures Failures + screenshots Healed specs Opus/Sonnet
Reporter Generate reports All results Markdown/HTML Sonnet
VisualAI Compare screenshots Baseline + current Visual diff Vision model
Router Select models Task info Model choice Groq Llama
FlakyDetector Detect flakiness Test runs Flakiness score N/A (stats)
AutoDiscovery Discover flows App URL Test suggestions Sonnet
QualityAuditor Audit quality URLs A11y + perf Sonnet
AccessibilityChecker Check a11y HTML/screenshots WCAG violations Sonnet
SecurityScanner Scan security URLs/code Vulnerabilities Opus
PerformanceAnalyzer Analyze perf Metrics Performance score Sonnet
TestImpactAnalyzer Impact analysis Code changes Affected tests GPT-4
RootCauseAnalyzer Why test failed Failure context Root cause Opus
SessionToTest Convert sessions Session data Generated tests Sonnet
NLPTestCreator Parse English Plain text Test specs Sonnet

Core Testing Agents

1. BaseAgent (base.py - 13KB)

Purpose: Abstract base class providing core functionality for all agents.

Key Methods:

# src/agents/base.py:45-89
class BaseAgent(ABC):
    def execute(self, state: TestingState) -> TestingState
    def _call_claude(self, messages: list, **kwargs) -> Message
    def _call_model(self, task_type: TaskType, messages: list) -> Message
    def _track_usage(self, response: Message) -> None
    def _parse_json_response(self, content: str) -> dict
    def _check_cost_limit(self) -> bool

Features: - Multi-model routing (Claude, GPT-4, Gemini, DeepSeek, Llama) - Automatic retry with exponential backoff - Token/cost tracking per call - Structured logging via structlog


2. CodeAnalyzerAgent (code_analyzer.py - 11KB)

Purpose: Analyzes codebases to identify testable surfaces using AST parsing.

Source: src/agents/code_analyzer.py:1-450

Key Methods:

async def execute(
    self,
    codebase_path: str,
    app_url: str,
    changed_files: list[str] = None
) -> AgentResult[CodeAnalysisResult]

Output Schema:

@dataclass
class CodeAnalysisResult:
    summary: str                          # High-level codebase description
    testable_surfaces: list[TestableSurface]  # Identified test targets
    framework_detected: str               # React, Next.js, Django, etc.
    language: str                         # Primary language
    recommendations: list[str]            # Testing recommendations

@dataclass
class TestableSurface:
    type: Literal["ui", "api", "db"]
    name: str
    path: str
    priority: Literal["critical", "high", "medium", "low"]
    description: str
    test_scenarios: list[str]
    metadata: dict

Model Used: TaskType.CODE_ANALYSIS → DeepSeek/GPT-4o (cost-optimized)


3. TestPlannerAgent (test_planner.py - 19KB)

Purpose: Creates detailed, prioritized test specifications from testable surfaces.

Source: src/agents/test_planner.py:1-780

Output Schema:

@dataclass
class TestSpec:
    id: str
    name: str
    type: Literal["ui", "api", "db"]
    priority: Literal["critical", "high", "medium", "low"]
    preconditions: list[str]
    steps: list[TestStep]
    assertions: list[TestAssertion]
    estimated_duration_ms: int

@dataclass
class TestStep:
    action: Literal["goto", "click", "fill", "assert", "wait", "hover", "select"]
    target: str          # CSS selector, URL, or XPath
    value: str           # Input value or expected text
    timeout: int = 30000 # milliseconds

System Prompt: Uses STAMP framework (Structure, Testability, Assets, Mutations, Priority)


4. UITesterAgent (ui_tester.py - 30KB)

Purpose: Executes UI tests using Playwright with hybrid DOM/Vision execution.

Source: src/agents/ui_tester.py:1-1200 and ui_tester_v2.py

Execution Modes:

Mode Description Speed Cost
DOM Playwright XPath/CSS Fast Low
Vision Claude Vision element identification Slow High
Hybrid DOM with Vision fallback Medium Medium

Output Schema:

@dataclass
class UITestResult:
    test_id: str
    status: Literal["passed", "failed", "error"]
    step_results: list[StepResult]
    assertion_results: list[AssertionResult]
    execution_mode: Literal["standard", "hybrid", "worker"]
    total_estimated_cost: float
    screenshots: list[bytes]  # Evidence

@dataclass
class StepResult:
    action: str
    success: bool
    duration_ms: int
    error: str | None
    screenshot: bytes | None
    mode_used: Literal["dom", "vision", "hybrid"]
    fallback_triggered: bool


5. APITesterAgent (api_tester.py - 14KB)

Purpose: Executes API tests with schema validation and request chaining.

Source: src/agents/api_tester.py:1-580

Supported Methods: GET, POST, PUT, DELETE, PATCH

Features: - Request chaining with variable extraction - JSON Schema validation - Authentication token management - Response time assertions

Output Schema:

@dataclass
class APITestResult:
    test_id: str
    status: Literal["passed", "failed", "error"]
    requests: list[APIRequestResult]
    schema_validations: list[SchemaValidationResult]
    total_duration_ms: int

@dataclass
class APIRequestResult:
    method: str
    url: str
    status_code: int
    response_time_ms: int
    body: Any
    success: bool


6. DBTesterAgent (db_tester.py - 17KB)

Purpose: Validates database state and data integrity post-operations.

Source: src/agents/db_tester.py:1-700

Features: - SQLAlchemy connection management - Constraint validation - Relationship integrity checks - Transaction rollback for cleanup

Output Schema:

@dataclass
class DBTestResult:
    test_id: str
    status: Literal["passed", "failed", "error"]
    queries: list[QueryResult]
    validations: list[DataValidationResult]

@dataclass
class QueryResult:
    query: str
    rows: list[dict]
    row_count: int
    execution_time_ms: int


Self-Healing Agent (THE DIFFERENTIATOR)

7. SelfHealerAgent (self_healer.py - 74KB)

Purpose: Analyzes test failures and auto-fixes broken tests. This is Argus's key competitive advantage.

Source: src/agents/self_healer.py:1-3200

Healing Strategies (Priority Order):

flowchart TD
    A[Test Failure] --> B{Cache Lookup<br/>7-day TTL}
    B -->|Hit| C[Apply Cached Fix]
    B -->|Miss| D{Code-Aware Healing<br/>Git History}
    D -->|Found| E[Apply Git-Based Fix]
    D -->|Miss| F{Memory Store<br/>Hybrid Retrieval}
    F -->|Found| G[Apply Learned Fix]
    F -->|Miss| H{Claude LLM<br/>Opus/Sonnet}
    H --> I[Generate New Fix]

    C --> J[Store in Cache]
    E --> J
    G --> J
    I --> J
    J --> K[Return Healed TestSpec]

Healing Modes:

Mode Accuracy Speed Cost
Cached 100% (verified) Instant Free
Code-Aware 99.9% Fast Low
Memory Store 95% Medium Low
LLM Fallback 90% Slow High

Key Methods:

# src/agents/self_healer.py:156-890
async def execute(
    self,
    test_spec: TestSpec,
    failure_details: FailureDetails,
    screenshot: bytes | None = None
) -> AgentResult[HealingResult]

async def _code_aware_heal(
    self,
    test_spec: TestSpec,
    failure_details: FailureDetails
) -> HealingResult | None

async def _lookup_cached_healing(
    self,
    failure_signature: str
) -> HealingResult | None

async def _lookup_memory_store_healing(
    self,
    failure_details: FailureDetails
) -> list[HealingCandidate]

def _calculate_intelligent_timeout(
    self,
    selector: str,
    historical_data: list[ExecutionMetric]
) -> int

Output Schema:

@dataclass
class HealingResult:
    test_id: str
    diagnosis: FailureDiagnosis
    suggested_fixes: list[FixSuggestion]
    auto_healed: bool
    healed_test_spec: dict | None

@dataclass
class FailureDiagnosis:
    failure_type: Literal[
        "selector_changed",
        "timing_issue",
        "ui_changed",
        "data_changed",
        "real_bug"
    ]
    confidence: float  # 0.0-1.0
    explanation: str
    evidence: list[str]
    code_context: CodeAwareContext | None

@dataclass
class CodeAwareContext:
    commit_sha: str
    commit_message: str
    commit_author: str
    commit_date: datetime
    old_selector: str
    new_selector: str
    file_changed: str
    code_confidence: float

Why 99.9% Accuracy: 1. Git History Analysis: Reads actual commits to find selector changes 2. Code-Aware Context: Knows WHO changed WHAT and WHEN 3. Component Rename Handling: Tracks renamed React/Vue components 4. Intelligent Timing: Learns timeout patterns from historical data


Intelligence Agents

8. VisualAIAgent (visual_ai.py - 20KB)

Purpose: Screenshot comparison and visual regression detection.

Source: src/agents/visual_ai.py:1-820

Comparison Methods: - Pixel-level diff - Perceptual hashing - AI-powered semantic comparison

Output Schema:

@dataclass
class VisualComparisonResult:
    baseline_path: str
    current_path: str
    match: bool
    match_percentage: float  # 0-100
    differences: list[VisualDifference]
    analysis_cost_usd: float

@dataclass
class VisualDifference:
    type: Literal["layout", "content", "style", "missing", "new", "dynamic"]
    severity: Literal["critical", "major", "minor", "info"]
    description: str
    bounding_box: tuple[int, int, int, int]  # x, y, width, height
    is_regression: bool


9. RootCauseAnalyzerAgent (root_cause_analyzer.py - 17KB)

Purpose: AI-powered analysis of WHY tests fail.

Source: src/agents/root_cause_analyzer.py:1-700

Failure Categories:

class FailureCategory(Enum):
    UI_CHANGE = "ui_change"           # Visual/structural changes
    NETWORK_ERROR = "network_error"   # API/network failures
    TIMING_ISSUE = "timing_issue"     # Race conditions
    DATA_MISMATCH = "data_mismatch"   # Test data issues
    REAL_BUG = "real_bug"             # Actual application defect
    ENVIRONMENT = "environment"       # Infrastructure issues
    TEST_DEFECT = "test_defect"       # Bug in the test itself

Output Schema:

@dataclass
class RootCauseResult:
    category: FailureCategory
    confidence: float
    summary: str
    detailed_analysis: str
    suggested_fix: str
    is_flaky: bool
    auto_healable: bool
    healing_suggestion: dict | None


10. FlakyTestDetectorAgent (flaky_detector.py - 17KB)

Purpose: Statistical flaky test detection and quarantine.

Source: src/agents/flaky_detector.py:1-700

Classification Thresholds:

Level Pass Rate Action
Stable > 95% Normal execution
Slightly Flaky 80-95% Monitor
Moderately Flaky 50-80% Add retries
Highly Flaky 30-50% Investigation
Quarantined < 30% Remove from CI

Output Schema:

@dataclass
class FlakinessReport:
    test_id: str
    flakiness_level: Literal[
        "stable", "slightly_flaky", "moderately_flaky",
        "highly_flaky", "quarantined"
    ]
    flakiness_score: float  # 0.0-1.0
    pass_rate: float
    total_runs: int
    likely_cause: Literal[
        "timing", "network", "data", "resource",
        "animation", "third_party"
    ]
    recommended_action: str
    should_quarantine: bool


Quality Agents

11. QualityAuditorAgent (quality_auditor.py - 24KB)

Purpose: Combined accessibility and performance auditing.

Source: src/agents/quality_auditor.py:1-980

Capabilities: - WCAG 2.1 Accessibility compliance - Core Web Vitals metrics - Lighthouse-style scoring


12. AccessibilityCheckerAgent (accessibility_checker.py - 18KB)

Purpose: WCAG 2.1 compliance testing.

Source: src/agents/accessibility_checker.py:1-740

WCAG Principles (POUR): - Perceivable: Alt text, captions, contrast - Operable: Keyboard nav, focus management - Understandable: Labels, error messages - Robust: Valid HTML, ARIA usage

Output Schema:

@dataclass
class AccessibilityIssue:
    wcag_criterion: str     # "1.1.1", "2.4.6", etc.
    wcag_level: Literal["A", "AA", "AAA"]
    principle: Literal["perceivable", "operable", "understandable", "robust"]
    impact: Literal["critical", "serious", "moderate", "minor"]
    affected_users: list[str]  # ["blind", "motor-impaired", "deaf", etc.]
    element_selector: str
    fix_suggestion: str


13. SecurityScannerAgent (security_scanner.py - 13KB)

Purpose: OWASP Top 10 vulnerability detection.

Source: src/agents/security_scanner.py:1-540

Vulnerability Categories: - A01:2021 - Broken Access Control - A02:2021 - Cryptographic Failures - A03:2021 - Injection (SQL, XSS, Command) - A05:2021 - Security Misconfiguration - A06:2021 - Vulnerable Components - A07:2021 - Authentication Failures

Output Schema:

@dataclass
class Vulnerability:
    category: VulnerabilityCategory
    severity: Literal["critical", "high", "medium", "low"]
    cvss_score: float  # 0-10
    cwe_id: str
    evidence: str
    remediation: str
    references: list[str]
    false_positive_likelihood: float


14. PerformanceAnalyzerAgent (performance_analyzer.py - 9KB)

Purpose: Core Web Vitals and performance metrics.

Source: src/agents/performance_analyzer.py:1-380

Metrics Collected:

Metric Good Needs Improvement Poor
LCP ≤ 2.5s 2.5-4s > 4s
FID ≤ 100ms 100-300ms > 300ms
CLS ≤ 0.1 0.1-0.25 > 0.25
INP ≤ 200ms 200-500ms > 500ms

Utility Agents

15. RouterAgent (router_agent.py - 19KB)

Purpose: Intelligent multi-model selection for cost optimization.

Source: src/agents/router_agent.py:1-780

Routing Hierarchy (cost-optimized):

Priority Model Cost/1M tokens Use Case
1 Groq Llama 3.1 8B $0.05 Routing decisions
2 Gemini Flash $0.075 Fast fallback
3 GPT-4o-mini $0.15 General tasks
4 Claude Haiku $0.80 Quality fallback
5 Claude Sonnet $3.00 Complex tasks
6 Claude Opus $15.00 Expert reasoning

16. ReporterAgent (reporter.py - 17KB)

Purpose: Multi-format report generation and ticket creation.

Source: src/agents/reporter.py:1-700

Output Formats: - Markdown reports - HTML reports with charts - GitHub Issues - Slack notifications - Jira tickets


17. NLPTestCreatorAgent (nlp_test_creator.py - 18KB)

Purpose: Plain English to test conversion.

Source: src/agents/nlp_test_creator.py:1-740

Example:

Input: "User should be able to sign up with email and password"

Output: TestSpec with:
- Navigate to /signup
- Fill email field
- Fill password field
- Click submit button
- Assert success message visible


18. AutoDiscoveryAgent (auto_discovery.py - 24KB)

Purpose: Autonomous app exploration and test generation.

Source: src/agents/auto_discovery.py:1-980

Capabilities: - Crawls application pages - Identifies interactive elements - Maps user flows - Generates test specifications


19. SessionToTestAgent (session_to_test.py - 21KB)

Purpose: Convert real user sessions into executable tests.

Source: src/agents/session_to_test.py:1-860

Data Sources: - FullStory / LogRocket / Hotjar recordings - Real User Monitoring (RUM) data - Error tracking (Sentry, Datadog) - Analytics events (Amplitude, Mixpanel)


20. TestImpactAnalyzerAgent (test_impact_analyzer.py - 19KB)

Purpose: Run only affected tests on code changes (10-100x CI speedup).

Source: src/agents/test_impact_analyzer.py:1-780

Output Schema:

@dataclass
class ImpactAnalysis:
    change_id: str
    affected_tests: list[str]
    unaffected_tests: list[str]
    new_tests_suggested: list[dict]
    risk_score: float  # 0-1
    estimated_time_saved: float  # seconds
    coverage_gaps: list[str]


Cost Optimization

Cost Per Test Suite Run

Code Analysis:       ~$0.10 (DeepSeek)
Test Planning:       ~$0.50 (Sonnet)
UI Execution:        ~$2.00 (Haiku × 50 steps)
API Execution:       ~$0.50 (Haiku)
Self-Healing (hit):  ~$0.00 (cache)
Self-Healing (miss): ~$1.50 (Sonnet)
Reporting:           ~$0.50 (Sonnet)
─────────────────────────────────────
TOTAL PER RUN:       ~$5.00-6.50

Multi-Model Savings

Scenario Single-Model Cost Multi-Model Cost Savings
100 test runs $2,500 $650 74%
Code analysis $300 $10 97%
Self-healing (cached) $150 $0 100%

Agent File Size Summary

self_healer.py ........... 74 KB (Largest - 4 healing modes)
prompts.py ............... 64 KB (System prompts for all agents)
ui_tester.py ............. 30 KB (Hybrid DOM+Vision)
auto_discovery.py ........ 24 KB (App crawling)
quality_auditor.py ....... 24 KB (A11y + Performance)
session_to_test.py ....... 21 KB (Session replay)
visual_ai.py ............. 20 KB (Screenshot comparison)
test_planner.py .......... 19 KB (Test generation)
router_agent.py .......... 19 KB (Multi-model routing)
test_impact_analyzer.py .. 19 KB (Dependency analysis)
nlp_test_creator.py ...... 18 KB (NLP parsing)
accessibility_checker.py . 18 KB (WCAG 2.1)
flaky_detector.py ........ 17 KB (Statistical analysis)
root_cause_analyzer.py ... 17 KB (Failure classification)
db_tester.py ............. 17 KB (SQL validation)
reporter.py .............. 17 KB (Multi-format reports)
api_tester.py ............ 14 KB (HTTP + schema)
security_scanner.py ...... 13 KB (OWASP scanning)
base.py .................. 13 KB (Multi-model client)
code_analyzer.py ......... 11 KB (AST parsing)
performance_analyzer.py .. 9 KB  (Core Web Vitals)
─────────────────────────────────────────────────
TOTAL: ~450 KB of agent code

Last Updated: January 2026