Argus Agent Catalog¶

Version: 1.0.0 Last Updated: 2026-01-27T17:30:00Z Document Status: Production Ready - Verified Against Codebase Source Files: src/agents/*.py (20+ specialized agents, ~450KB total)

Overview¶

Argus employs 20+ specialized AI agents orchestrated via LangGraph 1.0. Each agent has a specific responsibility in the testing lifecycle, from code analysis to self-healing.

graph TB
    subgraph "Orchestration Layer"
        LG["LangGraph Supervisor<br/>src/orchestrator/supervisor.py"]
    end

    subgraph "Analysis Agents"
        CA["CodeAnalyzerAgent<br/>11KB"]
        TP["TestPlannerAgent<br/>19KB"]
        AD["AutoDiscoveryAgent<br/>24KB"]
        TIA["TestImpactAnalyzer<br/>19KB"]
    end

    subgraph "Execution Agents"
        UI["UITesterAgent<br/>30KB"]
        API["APITesterAgent<br/>14KB"]
        DB["DBTesterAgent<br/>17KB"]
    end

    subgraph "Intelligence Agents"
        SH["SelfHealerAgent<br/>74KB"]
        VA["VisualAIAgent<br/>20KB"]
        RCA["RootCauseAnalyzer<br/>17KB"]
        FD["FlakyDetector<br/>17KB"]
    end

    subgraph "Quality Agents"
        QA["QualityAuditor<br/>24KB"]
        AC["AccessibilityChecker<br/>18KB"]
        SS["SecurityScanner<br/>13KB"]
        PA["PerformanceAnalyzer<br/>9KB"]
    end

    subgraph "Utility Agents"
        RA["RouterAgent<br/>19KB"]
        REP["ReporterAgent<br/>17KB"]
        NLP["NLPTestCreator<br/>18KB"]
        S2T["SessionToTest<br/>21KB"]
    end

    LG --> CA
    LG --> TP
    LG --> UI
    LG --> SH
    LG --> REP

    CA --> TP
    TP --> UI
    TP --> API
    TP --> DB
    UI --> SH
    API --> SH
    SH --> REP

Agent Responsibility Matrix¶

Agent	Purpose	Input	Output	Model
CodeAnalyzer	Find test surfaces	Codebase + URLs	Testable surfaces	Haiku/GPT-4o
TestPlanner	Create test specs	Surfaces	Test specs	Sonnet
UITester	Execute UI tests	Test specs	Pass/fail + screenshots	Sonnet (vision)
APITester	Execute API tests	Test specs	Response + validation	Haiku
DBTester	Validate data	Test specs	Query results	Haiku
SelfHealer	Fix failures	Failures + screenshots	Healed specs	Opus/Sonnet
Reporter	Generate reports	All results	Markdown/HTML	Sonnet
VisualAI	Compare screenshots	Baseline + current	Visual diff	Vision model
Router	Select models	Task info	Model choice	Groq Llama
FlakyDetector	Detect flakiness	Test runs	Flakiness score	N/A (stats)
AutoDiscovery	Discover flows	App URL	Test suggestions	Sonnet
QualityAuditor	Audit quality	URLs	A11y + perf	Sonnet
AccessibilityChecker	Check a11y	HTML/screenshots	WCAG violations	Sonnet
SecurityScanner	Scan security	URLs/code	Vulnerabilities	Opus
PerformanceAnalyzer	Analyze perf	Metrics	Performance score	Sonnet
TestImpactAnalyzer	Impact analysis	Code changes	Affected tests	GPT-4
RootCauseAnalyzer	Why test failed	Failure context	Root cause	Opus
SessionToTest	Convert sessions	Session data	Generated tests	Sonnet
NLPTestCreator	Parse English	Plain text	Test specs	Sonnet

Core Testing Agents¶

1. BaseAgent (`base.py` - 13KB)¶

Purpose: Abstract base class providing core functionality for all agents.

Key Methods:

# src/agents/base.py:45-89
class BaseAgent(ABC):
    def execute(self, state: TestingState) -> TestingState
    def _call_claude(self, messages: list, **kwargs) -> Message
    def _call_model(self, task_type: TaskType, messages: list) -> Message
    def _track_usage(self, response: Message) -> None
    def _parse_json_response(self, content: str) -> dict
    def _check_cost_limit(self) -> bool

Features: - Multi-model routing (Claude, GPT-4, Gemini, DeepSeek, Llama) - Automatic retry with exponential backoff - Token/cost tracking per call - Structured logging via structlog

2. CodeAnalyzerAgent (`code_analyzer.py` - 11KB)¶

Purpose: Analyzes codebases to identify testable surfaces using AST parsing.

Source: src/agents/code_analyzer.py:1-450

Key Methods:

async def execute(
    self,
    codebase_path: str,
    app_url: str,
    changed_files: list[str] = None
) -> AgentResult[CodeAnalysisResult]

Output Schema:

@dataclass
class CodeAnalysisResult:
    summary: str                          # High-level codebase description
    testable_surfaces: list[TestableSurface]  # Identified test targets
    framework_detected: str               # React, Next.js, Django, etc.
    language: str                         # Primary language
    recommendations: list[str]            # Testing recommendations

@dataclass
class TestableSurface:
    type: Literal["ui", "api", "db"]
    name: str
    path: str
    priority: Literal["critical", "high", "medium", "low"]
    description: str
    test_scenarios: list[str]
    metadata: dict

Model Used: TaskType.CODE_ANALYSIS → DeepSeek/GPT-4o (cost-optimized)

3. TestPlannerAgent (`test_planner.py` - 19KB)¶

Purpose: Creates detailed, prioritized test specifications from testable surfaces.

Source: src/agents/test_planner.py:1-780

Output Schema:

@dataclass
class TestSpec:
    id: str
    name: str
    type: Literal["ui", "api", "db"]
    priority: Literal["critical", "high", "medium", "low"]
    preconditions: list[str]
    steps: list[TestStep]
    assertions: list[TestAssertion]
    estimated_duration_ms: int

@dataclass
class TestStep:
    action: Literal["goto", "click", "fill", "assert", "wait", "hover", "select"]
    target: str          # CSS selector, URL, or XPath
    value: str           # Input value or expected text
    timeout: int = 30000 # milliseconds

System Prompt: Uses STAMP framework (Structure, Testability, Assets, Mutations, Priority)

4. UITesterAgent (`ui_tester.py` - 30KB)¶

Purpose: Executes UI tests using Playwright with hybrid DOM/Vision execution.

Source: src/agents/ui_tester.py:1-1200 and ui_tester_v2.py

Execution Modes:

Mode	Description	Speed	Cost
DOM	Playwright XPath/CSS	Fast	Low
Vision	Claude Vision element identification	Slow	High
Hybrid	DOM with Vision fallback	Medium	Medium

Output Schema:

@dataclass
class UITestResult:
    test_id: str
    status: Literal["passed", "failed", "error"]
    step_results: list[StepResult]
    assertion_results: list[AssertionResult]
    execution_mode: Literal["standard", "hybrid", "worker"]
    total_estimated_cost: float
    screenshots: list[bytes]  # Evidence

@dataclass
class StepResult:
    action: str
    success: bool
    duration_ms: int
    error: str | None
    screenshot: bytes | None
    mode_used: Literal["dom", "vision", "hybrid"]
    fallback_triggered: bool

5. APITesterAgent (`api_tester.py` - 14KB)¶

Purpose: Executes API tests with schema validation and request chaining.

Source: src/agents/api_tester.py:1-580

Supported Methods: GET, POST, PUT, DELETE, PATCH

Features: - Request chaining with variable extraction - JSON Schema validation - Authentication token management - Response time assertions

Output Schema:

@dataclass
class APITestResult:
    test_id: str
    status: Literal["passed", "failed", "error"]
    requests: list[APIRequestResult]
    schema_validations: list[SchemaValidationResult]
    total_duration_ms: int

@dataclass
class APIRequestResult:
    method: str
    url: str
    status_code: int
    response_time_ms: int
    body: Any
    success: bool

6. DBTesterAgent (`db_tester.py` - 17KB)¶

Purpose: Validates database state and data integrity post-operations.

Source: src/agents/db_tester.py:1-700

Features: - SQLAlchemy connection management - Constraint validation - Relationship integrity checks - Transaction rollback for cleanup

Output Schema:

@dataclass
class DBTestResult:
    test_id: str
    status: Literal["passed", "failed", "error"]
    queries: list[QueryResult]
    validations: list[DataValidationResult]

@dataclass
class QueryResult:
    query: str
    rows: list[dict]
    row_count: int
    execution_time_ms: int

Self-Healing Agent (THE DIFFERENTIATOR)¶

7. SelfHealerAgent (`self_healer.py` - 74KB)¶

Purpose: Analyzes test failures and auto-fixes broken tests. This is Argus's key competitive advantage.

Source: src/agents/self_healer.py:1-3200

Healing Strategies (Priority Order):

flowchart TD
    A[Test Failure] --> B{Cache Lookup<br/>7-day TTL}
    B -->|Hit| C[Apply Cached Fix]
    B -->|Miss| D{Code-Aware Healing<br/>Git History}
    D -->|Found| E[Apply Git-Based Fix]
    D -->|Miss| F{Memory Store<br/>Hybrid Retrieval}
    F -->|Found| G[Apply Learned Fix]
    F -->|Miss| H{Claude LLM<br/>Opus/Sonnet}
    H --> I[Generate New Fix]

    C --> J[Store in Cache]
    E --> J
    G --> J
    I --> J
    J --> K[Return Healed TestSpec]

Healing Modes:

Mode	Accuracy	Speed	Cost
Cached	100% (verified)	Instant	Free
Code-Aware	99.9%	Fast	Low
Memory Store	95%	Medium	Low
LLM Fallback	90%	Slow	High

Key Methods:

# src/agents/self_healer.py:156-890
async def execute(
    self,
    test_spec: TestSpec,
    failure_details: FailureDetails,
    screenshot: bytes | None = None
) -> AgentResult[HealingResult]

async def _code_aware_heal(
    self,
    test_spec: TestSpec,
    failure_details: FailureDetails
) -> HealingResult | None

async def _lookup_cached_healing(
    self,
    failure_signature: str
) -> HealingResult | None

async def _lookup_memory_store_healing(
    self,
    failure_details: FailureDetails
) -> list[HealingCandidate]

def _calculate_intelligent_timeout(
    self,
    selector: str,
    historical_data: list[ExecutionMetric]
) -> int

Output Schema:

id=__span-8-1>@dataclass class=k>class HealingResult: test_id: str diagnosis: FailureDiagnosis suggested_fixes: list[FixSuggestion] auto_healed: bool healed_test_spec: dict | None class=nd>@dataclass class=k>class FailureDiagnosis: failure_type: Literal[ "selector_changed", "timing_issue", "ui_changed", "data_changed", "real_bug" ] confidence: float # 0.0-1.0 explanation: str evidence: list[str] code_context: CodeAwareContext | None class=nd>@dataclass class=k>class CodeAwareContext: commit_sha: str commit_message: str commit_author: str commit_date: datetime old_selector: str new_selector: str file_changed: str code_confidence: float

Why 99.9% Accuracy: 1. Git History Analysis: Reads actual commits to find selector changes 2. Code-Aware Context: Knows WHO changed WHAT and WHEN 3. Component Rename Handling: Tracks renamed React/Vue components 4. Intelligent Timing: Learns timeout patterns from historical data

Intelligence Agents¶

8. VisualAIAgent (`visual_ai.py` - 20KB)¶

Purpose: Screenshot comparison and visual regression detection.

Source: src/agents/visual_ai.py:1-820

Comparison Methods: - Pixel-level diff - Perceptual hashing - AI-powered semantic comparison

Output Schema:

@dataclass
class VisualComparisonResult:
    baseline_path: str
    current_path: str
    match: bool
    match_percentage: float  # 0-100
    differences: list[VisualDifference]
    analysis_cost_usd: float

@dataclass
class VisualDifference:
    type: Literal["layout", "content", "style", "missing", "new", "dynamic"]
    severity: Literal["critical", "major", "minor", "info"]
    description: str
    bounding_box: tuple[int, int, int, int]  # x, y, width, height
    is_regression: bool

9. RootCauseAnalyzerAgent (`root_cause_analyzer.py` - 17KB)¶

Purpose: AI-powered analysis of WHY tests fail.

Source: src/agents/root_cause_analyzer.py:1-700

Failure Categories:

class FailureCategory(Enum):
    UI_CHANGE = "ui_change"           # Visual/structural changes
    NETWORK_ERROR = "network_error"   # API/network failures
    TIMING_ISSUE = "timing_issue"     # Race conditions
    DATA_MISMATCH = "data_mismatch"   # Test data issues
    REAL_BUG = "real_bug"             # Actual application defect
    ENVIRONMENT = "environment"       # Infrastructure issues
    TEST_DEFECT = "test_defect"       # Bug in the test itself

Output Schema:

@dataclass
class RootCauseResult:
    category: FailureCategory
    confidence: float
    summary: str
    detailed_analysis: str
    suggested_fix: str
    is_flaky: bool
    auto_healable: bool
    healing_suggestion: dict | None

10. FlakyTestDetectorAgent (`flaky_detector.py` - 17KB)¶

Purpose: Statistical flaky test detection and quarantine.

Source: src/agents/flaky_detector.py:1-700

Classification Thresholds:

Level	Pass Rate	Action
Stable	> 95%	Normal execution
Slightly Flaky	80-95%	Monitor
Moderately Flaky	50-80%	Add retries
Highly Flaky	30-50%	Investigation
Quarantined	< 30%	Remove from CI

Output Schema:

@dataclass
class FlakinessReport:
    test_id: str
    flakiness_level: Literal[
        "stable", "slightly_flaky", "moderately_flaky",
        "highly_flaky", "quarantined"
    ]
    flakiness_score: float  # 0.0-1.0
    pass_rate: float
    total_runs: int
    likely_cause: Literal[
        "timing", "network", "data", "resource",
        "animation", "third_party"
    ]
    recommended_action: str
    should_quarantine: bool

Quality Agents¶

11. QualityAuditorAgent (`quality_auditor.py` - 24KB)¶

Purpose: Combined accessibility and performance auditing.

Source: src/agents/quality_auditor.py:1-980

Capabilities: - WCAG 2.1 Accessibility compliance - Core Web Vitals metrics - Lighthouse-style scoring

12. AccessibilityCheckerAgent (`accessibility_checker.py` - 18KB)¶

Purpose: WCAG 2.1 compliance testing.

Source: src/agents/accessibility_checker.py:1-740

WCAG Principles (POUR): - Perceivable: Alt text, captions, contrast - Operable: Keyboard nav, focus management - Understandable: Labels, error messages - Robust: Valid HTML, ARIA usage

Output Schema:

@dataclass
class AccessibilityIssue:
    wcag_criterion: str     # "1.1.1", "2.4.6", etc.
    wcag_level: Literal["A", "AA", "AAA"]
    principle: Literal["perceivable", "operable", "understandable", "robust"]
    impact: Literal["critical", "serious", "moderate", "minor"]
    affected_users: list[str]  # ["blind", "motor-impaired", "deaf", etc.]
    element_selector: str
    fix_suggestion: str

13. SecurityScannerAgent (`security_scanner.py` - 13KB)¶

Purpose: OWASP Top 10 vulnerability detection.

Source: src/agents/security_scanner.py:1-540

Vulnerability Categories: - A01:2021 - Broken Access Control - A02:2021 - Cryptographic Failures - A03:2021 - Injection (SQL, XSS, Command) - A05:2021 - Security Misconfiguration - A06:2021 - Vulnerable Components - A07:2021 - Authentication Failures

Output Schema:

@dataclass
class Vulnerability:
    category: VulnerabilityCategory
    severity: Literal["critical", "high", "medium", "low"]
    cvss_score: float  # 0-10
    cwe_id: str
    evidence: str
    remediation: str
    references: list[str]
    false_positive_likelihood: float

14. PerformanceAnalyzerAgent (`performance_analyzer.py` - 9KB)¶

Purpose: Core Web Vitals and performance metrics.

Source: src/agents/performance_analyzer.py:1-380

Metrics Collected:

Metric	Good	Needs Improvement	Poor
LCP	≤ 2.5s	2.5-4s	> 4s
FID	≤ 100ms	100-300ms	> 300ms
CLS	≤ 0.1	0.1-0.25	> 0.25
INP	≤ 200ms	200-500ms	> 500ms

Utility Agents¶

15. RouterAgent (`router_agent.py` - 19KB)¶

Purpose: Intelligent multi-model selection for cost optimization.

Source: src/agents/router_agent.py:1-780

Routing Hierarchy (cost-optimized):

Priority	Model	Cost/1M tokens	Use Case
1	Groq Llama 3.1 8B	$0.05	Routing decisions
2	Gemini Flash	$0.075	Fast fallback
3	GPT-4o-mini	$0.15	General tasks
4	Claude Haiku	$0.80	Quality fallback
5	Claude Sonnet	$3.00	Complex tasks
6	Claude Opus	$15.00	Expert reasoning

16. ReporterAgent (`reporter.py` - 17KB)¶

Purpose: Multi-format report generation and ticket creation.

Source: src/agents/reporter.py:1-700

Output Formats: - Markdown reports - HTML reports with charts - GitHub Issues - Slack notifications - Jira tickets

17. NLPTestCreatorAgent (`nlp_test_creator.py` - 18KB)¶

Purpose: Plain English to test conversion.

Source: src/agents/nlp_test_creator.py:1-740

Example:

Input: "User should be able to sign up with email and password"

Output: TestSpec with:
- Navigate to /signup
- Fill email field
- Fill password field
- Click submit button
- Assert success message visible

18. AutoDiscoveryAgent (`auto_discovery.py` - 24KB)¶

Purpose: Autonomous app exploration and test generation.

Source: src/agents/auto_discovery.py:1-980

Capabilities: - Crawls application pages - Identifies interactive elements - Maps user flows - Generates test specifications

19. SessionToTestAgent (`session_to_test.py` - 21KB)¶

Purpose: Convert real user sessions into executable tests.

Source: src/agents/session_to_test.py:1-860

Data Sources: - FullStory / LogRocket / Hotjar recordings - Real User Monitoring (RUM) data - Error tracking (Sentry, Datadog) - Analytics events (Amplitude, Mixpanel)

20. TestImpactAnalyzerAgent (`test_impact_analyzer.py` - 19KB)¶

Purpose: Run only affected tests on code changes (10-100x CI speedup).

Source: src/agents/test_impact_analyzer.py:1-780

Output Schema:

@dataclass
class ImpactAnalysis:
    change_id: str
    affected_tests: list[str]
    unaffected_tests: list[str]
    new_tests_suggested: list[dict]
    risk_score: float  # 0-1
    estimated_time_saved: float  # seconds
    coverage_gaps: list[str]

Cost Optimization¶

Cost Per Test Suite Run¶

Code Analysis:       ~$0.10 (DeepSeek)
Test Planning:       ~$0.50 (Sonnet)
UI Execution:        ~$2.00 (Haiku × 50 steps)
API Execution:       ~$0.50 (Haiku)
Self-Healing (hit):  ~$0.00 (cache)
Self-Healing (miss): ~$1.50 (Sonnet)
Reporting:           ~$0.50 (Sonnet)
─────────────────────────────────────
TOTAL PER RUN:       ~$5.00-6.50

Multi-Model Savings¶

Scenario	Single-Model Cost	Multi-Model Cost	Savings
100 test runs	$2,500	$650	74%
Code analysis	$300	$10	97%
Self-healing (cached)	$150	$0	100%

Agent File Size Summary¶

self_healer.py ........... 74 KB (Largest - 4 healing modes)
prompts.py ............... 64 KB (System prompts for all agents)
ui_tester.py ............. 30 KB (Hybrid DOM+Vision)
auto_discovery.py ........ 24 KB (App crawling)
quality_auditor.py ....... 24 KB (A11y + Performance)
session_to_test.py ....... 21 KB (Session replay)
visual_ai.py ............. 20 KB (Screenshot comparison)
test_planner.py .......... 19 KB (Test generation)
router_agent.py .......... 19 KB (Multi-model routing)
test_impact_analyzer.py .. 19 KB (Dependency analysis)
nlp_test_creator.py ...... 18 KB (NLP parsing)
accessibility_checker.py . 18 KB (WCAG 2.1)
flaky_detector.py ........ 17 KB (Statistical analysis)
root_cause_analyzer.py ... 17 KB (Failure classification)
db_tester.py ............. 17 KB (SQL validation)
reporter.py .............. 17 KB (Multi-format reports)
api_tester.py ............ 14 KB (HTTP + schema)
security_scanner.py ...... 13 KB (OWASP scanning)
base.py .................. 13 KB (Multi-model client)
code_analyzer.py ......... 11 KB (AST parsing)
performance_analyzer.py .. 9 KB  (Core Web Vitals)
─────────────────────────────────────────────────
TOTAL: ~450 KB of agent code

Last Updated: January 2026

Argus Agent Catalog¶

Overview¶

Agent Responsibility Matrix¶

Core Testing Agents¶

1. BaseAgent (base.py - 13KB)¶

2. CodeAnalyzerAgent (code_analyzer.py - 11KB)¶

3. TestPlannerAgent (test_planner.py - 19KB)¶

4. UITesterAgent (ui_tester.py - 30KB)¶

5. APITesterAgent (api_tester.py - 14KB)¶

6. DBTesterAgent (db_tester.py - 17KB)¶

Self-Healing Agent (THE DIFFERENTIATOR)¶

7. SelfHealerAgent (self_healer.py - 74KB)¶

Intelligence Agents¶

8. VisualAIAgent (visual_ai.py - 20KB)¶

9. RootCauseAnalyzerAgent (root_cause_analyzer.py - 17KB)¶

10. FlakyTestDetectorAgent (flaky_detector.py - 17KB)¶

Quality Agents¶

11. QualityAuditorAgent (quality_auditor.py - 24KB)¶

12. AccessibilityCheckerAgent (accessibility_checker.py - 18KB)¶

13. SecurityScannerAgent (security_scanner.py - 13KB)¶

14. PerformanceAnalyzerAgent (performance_analyzer.py - 9KB)¶

Utility Agents¶

15. RouterAgent (router_agent.py - 19KB)¶

16. ReporterAgent (reporter.py - 17KB)¶

17. NLPTestCreatorAgent (nlp_test_creator.py - 18KB)¶

18. AutoDiscoveryAgent (auto_discovery.py - 24KB)¶

19. SessionToTestAgent (session_to_test.py - 21KB)¶

20. TestImpactAnalyzerAgent (test_impact_analyzer.py - 19KB)¶

Cost Optimization¶

Cost Per Test Suite Run¶

Multi-Model Savings¶

Agent File Size Summary¶

1. BaseAgent (`base.py` - 13KB)¶

2. CodeAnalyzerAgent (`code_analyzer.py` - 11KB)¶

3. TestPlannerAgent (`test_planner.py` - 19KB)¶

4. UITesterAgent (`ui_tester.py` - 30KB)¶

5. APITesterAgent (`api_tester.py` - 14KB)¶

6. DBTesterAgent (`db_tester.py` - 17KB)¶

7. SelfHealerAgent (`self_healer.py` - 74KB)¶

8. VisualAIAgent (`visual_ai.py` - 20KB)¶

9. RootCauseAnalyzerAgent (`root_cause_analyzer.py` - 17KB)¶

10. FlakyTestDetectorAgent (`flaky_detector.py` - 17KB)¶

11. QualityAuditorAgent (`quality_auditor.py` - 24KB)¶

12. AccessibilityCheckerAgent (`accessibility_checker.py` - 18KB)¶

13. SecurityScannerAgent (`security_scanner.py` - 13KB)¶

14. PerformanceAnalyzerAgent (`performance_analyzer.py` - 9KB)¶

15. RouterAgent (`router_agent.py` - 19KB)¶

16. ReporterAgent (`reporter.py` - 17KB)¶

17. NLPTestCreatorAgent (`nlp_test_creator.py` - 18KB)¶

18. AutoDiscoveryAgent (`auto_discovery.py` - 24KB)¶

19. SessionToTestAgent (`session_to_test.py` - 21KB)¶

20. TestImpactAnalyzerAgent (`test_impact_analyzer.py` - 19KB)¶