Argus E2E Testing Agent - Complete Architecture Documentation¶

Version: 2.9.0 Last Updated: 2026-01-27T17:00:00Z Document Status: Production Ready - Verified Against Codebase Audit Classification: Technical Architecture Verification Method: Automated Deep Analysis via 8 Research Agents Git Commit: d4a963a

Table of Contents¶

Executive Summary
System Architecture Overview
LangGraph Orchestration Architecture
Multi-Agent Architecture
Database Architecture
API Architecture
Data Flow Architecture
Real-Time Streaming Architecture
Human-in-the-Loop Architecture
Time Travel & Debugging Architecture
Memory Store Architecture
Security Architecture
Deployment Architecture
Integration Architecture
Technology Stack
Cost Management
Version History
Architecture Decision Records

1. Executive Summary¶

Argus is an autonomous E2E full-stack testing platform powered by Claude AI. The system leverages LangGraph 1.0 for durable orchestration, enabling:

Autonomous Test Generation - AI analyzes codebases and generates comprehensive test plans
Hybrid UI Testing - Combines Claude Computer Use API with Playwright for reliable execution
Self-Healing Tests - AI automatically fixes broken selectors and assertions
Multi-Agent Coordination - Supervisor pattern orchestrates 5 specialized agents
Durable Execution - PostgreSQL-backed checkpointing survives crashes
Time Travel Debugging - Replay, fork, and compare execution states
Real-Time Streaming - SSE-based live execution feedback
Human-in-the-Loop - Configurable breakpoints with approval workflows

Key Metrics (v2.9.0) - Verified¶

Metric	Value	Verification Source
Total API Endpoints	80+ (40 route modules)	API Endpoints Agent (acf875c)
Database Tables	20+ production tables	Database Schema Agent (a39a774)
Specialized Agents	20+ (see Agent Architecture)	Agent Architecture Agent (a23e33d)
LangGraph Features	7 (PostgresSaver, Memory, Streaming, HITL, Supervisor, Time Travel, Chat)	LangGraph Agent (a1ca712)
Event Types	8 (CODEBASE_INGESTED through DLQ)	Event System Agent (acfe5cb)
Knowledge Layer	Cognee ECL + FalkorDB + pgvector	Cognee Agent (a3c1fb6)
Dashboard Pages	30+ pages (Next.js 15 + React 19)	Dashboard Agent (a298ac8)
K8s Components	Redpanda, FalkorDB, Valkey, Flink, Cognee Worker	K8s Infrastructure Agent (acb311c)
Supported Browsers	Chrome, Firefox, Safari, Edge	-
AI Model Providers	9 (OpenRouter, Anthropic, OpenAI, Google, Groq, DeepSeek, Cerebras, Together, Local)	-
Model Routing	45+ models with intelligent task-based selection	-
Security Middleware	7 layers (CORS, Headers, Audit, Rate Limit, Auth, Request Size, Core)	-

2. System Architecture Overview¶

2.1 High-Level System Diagram¶

graph TB
    subgraph "Client Layer"
        WEB[Web Dashboard<br/>Next.js 14]
        CLI[CLI Tool<br/>Python]
        API_CLIENT[API Clients<br/>REST/SSE]
        MCP[MCP Clients<br/>VS Code/Claude]
    end

    subgraph "API Gateway Layer"
        FASTAPI[FastAPI Server<br/>Uvicorn ASGI]
        AUTH[Authentication<br/>Supabase Auth]
        SSE[SSE Streaming<br/>sse-starlette]
    end

    subgraph "Orchestration Layer"
        LANGGRAPH[LangGraph 1.0<br/>State Machine]
        SUPERVISOR[Supervisor Agent<br/>Coordinator]
        CHECKPOINT[PostgresSaver<br/>Checkpointing]
        MEMORY[Memory Store<br/>pgvector]
    end

    subgraph "Agent Layer"
        CODE_ANALYZER[Code Analyzer<br/>Agent]
        UI_TESTER[UI Tester<br/>Agent]
        API_TESTER[API Tester<br/>Agent]
        SELF_HEALER[Self-Healer<br/>Agent]
        REPORTER[Reporter<br/>Agent]
    end

    subgraph "Execution Layer"
        PLAYWRIGHT[Playwright<br/>Browser Automation]
        COMPUTER_USE[Claude Computer Use<br/>Vision AI]
        HTTP_CLIENT[HTTPX<br/>API Testing]
    end

    subgraph "Data Layer"
        POSTGRES[(PostgreSQL<br/>Supabase)]
        PGVECTOR[pgvector<br/>Semantic Search]
        S3[S3/R2<br/>Screenshots]
    end

    subgraph "AI Layer"
        CLAUDE[Claude API<br/>Anthropic]
        EMBEDDINGS[OpenAI Embeddings<br/>text-embedding-3-small]
    end

    WEB --> FASTAPI
    CLI --> FASTAPI
    API_CLIENT --> FASTAPI
    MCP --> FASTAPI

    FASTAPI --> AUTH
    FASTAPI --> SSE
    FASTAPI --> LANGGRAPH

    LANGGRAPH --> SUPERVISOR
    LANGGRAPH --> CHECKPOINT
    LANGGRAPH --> MEMORY
    CHECKPOINT --> POSTGRES
    MEMORY --> PGVECTOR

    SUPERVISOR --> CODE_ANALYZER
    SUPERVISOR --> UI_TESTER
    SUPERVISOR --> API_TESTER
    SUPERVISOR --> SELF_HEALER
    SUPERVISOR --> REPORTER

    UI_TESTER --> PLAYWRIGHT
    UI_TESTER --> COMPUTER_USE
    API_TESTER --> HTTP_CLIENT

    CODE_ANALYZER --> CLAUDE
    UI_TESTER --> CLAUDE
    SELF_HEALER --> CLAUDE
    SELF_HEALER --> MEMORY
    REPORTER --> CLAUDE

    PLAYWRIGHT --> S3
    COMPUTER_USE --> S3

2.2 Component Summary Table¶

Component	Technology	Version	Purpose
Web Dashboard	Next.js	14.x	User interface
API Server	FastAPI	0.115+	REST API, SSE streaming
Orchestrator	LangGraph	1.0.5+	State machine, workflow
Checkpointer	PostgresSaver	2.0+	Durable execution
Memory Store	pgvector	0.5+	Semantic search
Browser Automation	Playwright	1.48+	DOM interaction
Vision Testing	Claude Computer Use	-	Screenshot analysis
Database	PostgreSQL (Supabase)	15+	Persistence
Object Storage	S3/R2	-	Screenshots, artifacts

3. LangGraph Orchestration Architecture¶

3.1 State Machine Diagram¶

stateDiagram-v2
    [*] --> analyze_code: Start Test Run

    analyze_code --> plan_tests: Surfaces Found
    analyze_code --> report: No Surfaces / Error

    plan_tests --> execute_test: Tests Planned
    plan_tests --> report: No Tests / Error

    execute_test --> execute_test: More Tests
    execute_test --> self_heal: Test Failed + Healing Enabled
    execute_test --> report: All Tests Complete

    self_heal --> execute_test: Healed Successfully
    self_heal --> report: Max Retries / Cannot Heal

    report --> [*]: Generate Report

    note right of analyze_code
        Checkpoint saved
        after each node
    end note

    note right of execute_test
        Interrupt Point:
        require_test_plan_approval
    end note

    note right of self_heal
        Interrupt Point:
        require_healing_approval
    end note

3.2 LangGraph Configuration Architecture¶

graph TB
    subgraph "Graph Definition"
        BUILDER[StateGraph Builder]
        NODES[Node Functions<br/>analyze, plan, execute, heal, report]
        EDGES[Conditional Edges<br/>route_after_analysis, etc.]
        STATE[TestingState Schema]
    end

    subgraph "Compilation Config"
        CHECKPOINTER[PostgresSaver<br/>Durable State]
        INTERRUPTS[interrupt_before<br/>Human Approval]
        STREAM[Stream Mode<br/>values, events, custom]
    end

    subgraph "Runtime Config"
        THREAD[thread_id<br/>Execution Context]
        CONFIG[RunnableConfig<br/>recursion_limit, tags]
        CALLBACKS[Event Handlers<br/>on_node_start, etc.]
    end

    BUILDER --> NODES
    NODES --> EDGES
    EDGES --> STATE

    STATE --> CHECKPOINTER
    CHECKPOINTER --> INTERRUPTS
    INTERRUPTS --> STREAM

    STREAM --> THREAD
    THREAD --> CONFIG
    CONFIG --> CALLBACKS

3.3 State Schema Definition¶

classDiagram
    class TestingState {
        +str run_id
        +str codebase_path
        +str app_url
        +int pr_number
        +list~str~ changed_files
        +list~BaseMessage~ messages
        +str codebase_summary
        +list~dict~ testable_surfaces
        +list~TestSpec~ test_plan
        +int current_test_index
        +list~TestResult~ test_results
        +list~FailureAnalysis~ failures
        +list~str~ screenshots
        +list~str~ healing_queue
        +int healing_attempts
        +list~dict~ healed_tests
        +int iteration
        +int max_iterations
        +int total_tokens
        +float total_cost
        +int passed_count
        +int failed_count
        +int skipped_count
        +str error
        +bool should_continue
        +str started_at
        +str completed_at
    }

    class TestSpec {
        +str id
        +TestType type
        +str name
        +Priority priority
        +list~dict~ steps
        +list~dict~ assertions
        +dict preconditions
        +dict postconditions
        +int timeout_seconds
        +list~str~ tags
        +to_dict() dict
    }

    class TestResult {
        +str test_id
        +TestStatus status
        +float duration_seconds
        +int assertions_passed
        +int assertions_failed
        +str error_message
        +str stack_trace
        +list~str~ screenshots
        +dict metadata
        +to_dict() dict
    }

    class FailureAnalysis {
        +str test_id
        +str failure_type
        +str root_cause
        +float confidence
        +list~str~ suggested_fixes
        +dict context
        +to_dict() dict
    }

    class TestType {
        <<enumeration>>
        UI
        API
        E2E
        INTEGRATION
        UNIT
    }

    class TestStatus {
        <<enumeration>>
        PENDING
        RUNNING
        PASSED
        FAILED
        SKIPPED
        HEALED
    }

    class Priority {
        <<enumeration>>
        CRITICAL
        HIGH
        MEDIUM
        LOW
    }

    TestingState --> TestSpec : test_plan
    TestingState --> TestResult : test_results
    TestingState --> FailureAnalysis : failures
    TestSpec --> TestType
    TestSpec --> Priority
    TestResult --> TestStatus

3.4 Checkpoint Persistence Flow¶

sequenceDiagram
    participant G as Graph Execution
    participant PS as PostgresSaver
    participant DB as PostgreSQL
    participant R as Recovery

    Note over G,DB: Normal Execution
    G->>G: Execute Node
    G->>PS: Save Checkpoint
    PS->>DB: INSERT checkpoint blob
    PS->>DB: INSERT write records
    DB-->>PS: Committed
    PS-->>G: Checkpoint ID

    Note over G,DB: Crash Recovery
    R->>PS: get_tuple(thread_id)
    PS->>DB: SELECT checkpoint, writes
    DB-->>PS: Checkpoint Data
    PS->>PS: Deserialize State
    PS-->>R: Restored State
    R->>G: Resume Execution

4. Multi-Agent Architecture¶

4.1 Supervisor Pattern Diagram¶

graph TB
    subgraph "Supervisor Agent"
        SUPERVISOR[Supervisor<br/>claude-sonnet-4-5]
        ROUTER[Dynamic Router<br/>Next Agent Selection]
        AGGREGATOR[State Aggregator<br/>Merge Results]
    end

    subgraph "Specialized Agents"
        CA[Code Analyzer<br/>claude-sonnet-4-5<br/>Parse & Discover]
        UI[UI Tester<br/>claude-sonnet-4-5<br/>Browser Tests]
        API[API Tester<br/>claude-haiku-3-5<br/>HTTP Tests]
        SH[Self-Healer<br/>claude-opus-4-5<br/>Fix Failures]
        RP[Reporter<br/>claude-haiku-3-5<br/>Generate Reports]
    end

    subgraph "Shared Resources"
        STATE[(Shared State<br/>TestingState)]
        TOOLS[Tool Registry<br/>Playwright, HTTP, etc.]
        MEMORY[Memory Store<br/>Failure Patterns]
    end

    SUPERVISOR --> ROUTER
    ROUTER -->|analyze| CA
    ROUTER -->|test_ui| UI
    ROUTER -->|test_api| API
    ROUTER -->|heal| SH
    ROUTER -->|report| RP

    CA --> AGGREGATOR
    UI --> AGGREGATOR
    API --> AGGREGATOR
    SH --> AGGREGATOR
    RP --> AGGREGATOR

    AGGREGATOR --> SUPERVISOR

    CA -.-> STATE
    UI -.-> STATE
    API -.-> STATE
    SH -.-> STATE
    RP -.-> STATE

    UI -.-> TOOLS
    API -.-> TOOLS
    SH -.-> MEMORY

4.2 Agent Capabilities Matrix¶

Agent	Model	Tools	Outputs
Code Analyzer	Sonnet 4.5	read_file, glob, grep, parse_ast	testable_surfaces, codebase_summary
UI Tester	Sonnet 4.5	playwright_*, computer_use	test_results, screenshots
API Tester	Haiku 3.5	http_get, http_post, validate_schema	api_results, response_times
Self-Healer	Opus 4.5	analyze_failure, query_memory, generate_fix	healed_tests, confidence
Reporter	Haiku 3.5	generate_report, send_notification	report_url, notifications

4.3 Agent Communication Sequence¶

sequenceDiagram
    participant S as Supervisor
    participant CA as Code Analyzer
    participant UI as UI Tester
    participant SH as Self-Healer
    participant RP as Reporter
    participant MEM as Memory Store

    S->>CA: analyze_codebase(path)
    CA-->>S: {testable_surfaces, summary}

    S->>S: plan_tests(surfaces)

    loop For each test
        S->>UI: execute_test(spec)
        alt Test Passed
            UI-->>S: {status: passed}
        else Test Failed
            UI-->>S: {status: failed, error}
            S->>MEM: search_similar_failures(error)
            MEM-->>S: {past_fixes}
            S->>SH: heal_test(failure, past_fixes)
            SH-->>S: {healed_spec, confidence: 0.85}
            S->>UI: execute_test(healed_spec)
            UI-->>S: {status: passed}
            S->>MEM: store_fix(original, healed)
        end
    end

    S->>RP: generate_report(results)
    RP-->>S: {report_url, slack_sent: true}

5. Database Architecture¶

5.1 Entity Relationship Diagram¶

erDiagram
    organizations ||--o{ projects : contains
    organizations ||--o{ organization_members : has
    organizations ||--o{ notification_channels : configures
    organizations ||--o{ api_keys : owns

    users ||--o{ organization_members : belongs_to
    users ||--o{ test_runs : triggers

    projects ||--o{ tests : contains
    projects ||--o{ test_runs : executes
    projects ||--o{ test_schedules : schedules
    projects ||--o{ parameterized_tests : defines

    test_runs ||--o{ test_results : produces
    test_runs ||--o{ test_steps : logs
    test_runs ||--o{ screenshots : captures

    tests ||--o{ test_results : has

    test_schedules ||--o{ schedule_runs : triggers

    parameterized_tests ||--o{ parameter_sets : has
    parameterized_tests ||--o{ parameterized_results : produces

    notification_channels ||--o{ notification_rules : defines
    notification_channels ||--o{ notification_logs : logs

    langgraph_checkpoints ||--o{ langgraph_writes : tracks

    organizations {
        uuid id PK
        string name
        string slug UK
        string plan
        numeric ai_budget_daily_usd
        jsonb settings
        timestamp created_at
    }

    projects {
        uuid id PK
        uuid organization_id FK
        string name
        string app_url
        jsonb config
        timestamp created_at
    }

    test_runs {
        uuid id PK
        uuid project_id FK
        string thread_id UK
        string status
        string trigger
        int total_tests
        int passed_tests
        int failed_tests
        timestamp started_at
        timestamp completed_at
    }

    test_results {
        uuid id PK
        uuid test_run_id FK
        uuid test_id FK
        string status
        float duration_seconds
        text error_message
        jsonb metadata
    }

    langgraph_checkpoints {
        string thread_id PK
        string checkpoint_ns PK
        string checkpoint_id PK
        string parent_checkpoint_id
        bytea checkpoint
        jsonb metadata
        timestamp created_at
    }

    langgraph_writes {
        string thread_id PK
        string checkpoint_ns PK
        string checkpoint_id PK
        int task_id PK
        int idx PK
        string channel
        string type
        bytea blob
    }

    langgraph_memory_store {
        uuid id PK
        string namespace
        string key UK
        jsonb value
        vector_1536 embedding
        timestamp created_at
        timestamp updated_at
    }

5.2 LangGraph Tables Schema¶

graph TB
    subgraph "Checkpoint Tables"
        CP[langgraph_checkpoints<br/>Primary checkpoint storage]
        WR[langgraph_writes<br/>Incremental writes]
    end

    subgraph "Memory Tables"
        MS[langgraph_memory_store<br/>Long-term memory with vectors]
    end

    subgraph "Checkpoint Fields"
        CP_F[thread_id: TEXT<br/>checkpoint_ns: TEXT<br/>checkpoint_id: TEXT<br/>parent_checkpoint_id: TEXT<br/>checkpoint: BYTEA<br/>metadata: JSONB<br/>created_at: TIMESTAMPTZ]
    end

    subgraph "Write Fields"
        WR_F[thread_id: TEXT<br/>checkpoint_ns: TEXT<br/>checkpoint_id: TEXT<br/>task_id: INT<br/>idx: INT<br/>channel: TEXT<br/>type: TEXT<br/>blob: BYTEA]
    end

    subgraph "Memory Fields"
        MS_F[id: UUID<br/>namespace: TEXT<br/>key: TEXT<br/>value: JSONB<br/>embedding: VECTOR(1536)<br/>created_at: TIMESTAMPTZ<br/>updated_at: TIMESTAMPTZ]
    end

    CP --> CP_F
    WR --> WR_F
    MS --> MS_F

    CP -.->|FK| WR

5.3 Semantic Search Functions¶

-- Search similar failure patterns
CREATE FUNCTION search_similar_failures(
    query_embedding vector(1536),
    match_count int DEFAULT 5,
    similarity_threshold float DEFAULT 0.7
) RETURNS TABLE (
    id uuid,
    namespace text,
    key text,
    value jsonb,
    similarity float
) AS $$
    SELECT
        id, namespace, key, value,
        1 - (embedding <=> query_embedding) as similarity
    FROM langgraph_memory_store
    WHERE namespace = 'failures'
      AND 1 - (embedding <=> query_embedding) > similarity_threshold
    ORDER BY embedding <=> query_embedding
    LIMIT match_count;
$$ LANGUAGE sql;

6. API Architecture¶

6.1 API Endpoint Map¶

graph TB
    subgraph "Test Execution API"
        E1[POST /api/v1/test/run<br/>Start test execution]
        E2[GET /api/v1/test/status/{id}<br/>Get execution status]
        E3[POST /api/v1/test/cancel/{id}<br/>Cancel execution]
    end

    subgraph "Streaming API"
        S1[POST /api/v1/stream/test<br/>Stream test execution SSE]
        S2[POST /api/v1/stream/chat<br/>Stream chat response SSE]
        S3[GET /api/v1/stream/status/{id}<br/>Get stream status]
        S4[POST /api/v1/stream/resume/{id}<br/>Resume paused stream]
        S5[DELETE /api/v1/stream/cancel/{id}<br/>Cancel stream]
    end

    subgraph "Time Travel API"
        T1[GET /api/v1/time-travel/history/{id}<br/>Get checkpoint history]
        T2[GET /api/v1/time-travel/state/{id}/{cp}<br/>Get state at checkpoint]
        T3[POST /api/v1/time-travel/replay<br/>Replay from checkpoint]
        T4[POST /api/v1/time-travel/fork<br/>Fork execution]
        T5[GET /api/v1/time-travel/compare<br/>Compare two states]
    end

    subgraph "Approval API"
        A1[GET /api/v1/approvals/pending<br/>List pending approvals]
        A2[POST /api/v1/approvals/{id}/approve<br/>Approve and resume]
        A3[POST /api/v1/approvals/{id}/reject<br/>Reject and skip]
        A4[POST /api/v1/approvals/{id}/modify<br/>Modify and resume]
    end

    subgraph "Project API"
        P1[GET /api/v1/projects<br/>List projects]
        P2[POST /api/v1/projects<br/>Create project]
        P3[GET /api/v1/projects/{id}<br/>Get project]
        P4[PATCH /api/v1/projects/{id}<br/>Update project]
    end

    subgraph "Chat API"
        C1[POST /api/v1/chat<br/>Send chat message]
        C2[GET /api/v1/chat/history/{id}<br/>Get chat history]
    end

6.2 Request/Response Flow¶

sequenceDiagram
    participant C as Client
    participant GW as API Gateway
    participant AUTH as Auth Middleware
    participant H as Handler
    participant O as Orchestrator
    participant DB as PostgreSQL

    C->>GW: POST /api/v1/stream/test
    GW->>AUTH: Validate JWT Token
    AUTH-->>GW: User Context
    GW->>H: Route to Handler

    H->>O: Create Thread
    O->>DB: Initialize Checkpoint
    DB-->>O: thread_id

    H-->>C: SSE Connection Opened

    rect rgb(240, 248, 255)
        Note over H,DB: Streaming Loop
        loop astream_events()
            O->>O: Execute Node
            O->>DB: Save Checkpoint
            O-->>H: Yield Event
            H-->>C: event: state_update
            H-->>C: event: log
            H-->>C: event: screenshot
        end
    end

    O->>DB: Final Checkpoint
    H-->>C: event: complete
    H-->>C: Connection Closed

6.3 SSE Event Types¶

Event Type	Description	Payload
`state_update`	Full state snapshot	`{state: TestingState}`
`node_start`	Agent activated	`{node: string, timestamp: string}`
`node_end`	Agent completed	`{node: string, duration_ms: number}`
`log`	Log entry	`{level: string, message: string}`
`screenshot`	Screenshot captured	`{base64: string, step: number}`
`interrupt`	Awaiting approval	`{node: string, reason: string}`
`complete`	Execution finished	`{summary: object}`
`error`	Fatal error	`{error: string, stack?: string}`

7. Data Flow Architecture¶

7.1 Test Execution Data Flow¶

flowchart TB
    subgraph "Input"
        REQ[Test Request<br/>app_url, options]
        CODE[Codebase<br/>Source Files]
        CONFIG[Configuration<br/>Settings]
    end

    subgraph "Analysis Phase"
        PARSE[Parse Codebase<br/>AST Analysis]
        IDENTIFY[Identify Surfaces<br/>Routes, Forms, APIs]
        PLAN[Generate Plan<br/>Prioritized Tests]
    end

    subgraph "Execution Phase"
        INIT[Initialize Browser<br/>Playwright Session]
        EXEC[Execute Steps<br/>Click, Type, Assert]
        CAPTURE[Capture Evidence<br/>Screenshots, DOM]
        ASSERT[Run Assertions<br/>Validate Results]
    end

    subgraph "Healing Phase"
        ANALYZE[Analyze Failure<br/>Root Cause]
        MATCH[Query Memory<br/>Similar Failures]
        FIX[Generate Fix<br/>New Selector]
        RETRY[Retry Test<br/>With Fix]
    end

    subgraph "Output"
        RESULTS[Test Results<br/>Pass/Fail/Skip]
        REPORT[Report<br/>HTML/JSON]
        NOTIFY[Notifications<br/>Slack/Email]
        ARTIFACTS[Artifacts<br/>Screenshots/Logs]
    end

    REQ --> PARSE
    CODE --> PARSE
    CONFIG --> PARSE

    PARSE --> IDENTIFY
    IDENTIFY --> PLAN

    PLAN --> INIT
    INIT --> EXEC
    EXEC --> CAPTURE
    CAPTURE --> ASSERT

    ASSERT -->|Pass| RESULTS
    ASSERT -->|Fail| ANALYZE

    ANALYZE --> MATCH
    MATCH --> FIX
    FIX --> RETRY
    RETRY -->|Pass| RESULTS
    RETRY -->|Fail 3x| RESULTS

    RESULTS --> REPORT
    REPORT --> NOTIFY
    CAPTURE --> ARTIFACTS

7.2 Checkpoint Data Flow¶

flowchart LR
    subgraph "State Changes"
        S1[State v1<br/>Initial]
        S2[State v2<br/>After Analysis]
        S3[State v3<br/>During Execution]
        S4[State v4<br/>After Healing]
    end

    subgraph "Checkpointing"
        CP1[CP-001<br/>parent: null]
        CP2[CP-002<br/>parent: CP-001]
        CP3[CP-003<br/>parent: CP-002]
        CP4[CP-004<br/>parent: CP-003]
    end

    subgraph "PostgreSQL"
        DB[(checkpoints table)]
        WR[(writes table)]
    end

    subgraph "Time Travel"
        HIST[Get History<br/>All checkpoints]
        REPLAY[Replay<br/>From CP-002]
        FORK[Fork<br/>New thread]
    end

    S1 -->|serialize| CP1
    S2 -->|serialize| CP2
    S3 -->|serialize| CP3
    S4 -->|serialize| CP4

    CP1 --> DB
    CP2 --> DB
    CP3 --> DB
    CP4 --> DB

    CP1 --> WR
    CP2 --> WR
    CP3 --> WR
    CP4 --> WR

    DB --> HIST
    CP2 --> REPLAY
    CP2 --> FORK

8. Real-Time Streaming Architecture¶

8.1 SSE Streaming Flow¶

sequenceDiagram
    participant C as Client
    participant SSE as SSE Endpoint
    participant O as Orchestrator
    participant A as Agent
    participant S as State Store

    C->>SSE: POST /stream/test
    Note over SSE: Accept: text/event-stream

    SSE->>O: Start Graph Execution
    O->>S: Create Initial State

    rect rgb(240, 248, 255)
        Note over SSE,S: Streaming Loop (astream_events)

        loop For each event
            O->>A: Invoke Agent Node
            A->>A: Process Task
            A->>S: Update State
            S-->>O: State Updated
            O-->>SSE: Yield custom_event
            SSE-->>C: data: {"event": "state_update", ...}

            A->>A: Log Action
            O-->>SSE: Yield log_event
            SSE-->>C: data: {"event": "log", ...}

            A->>A: Take Screenshot
            O-->>SSE: Yield media_event
            SSE-->>C: data: {"event": "screenshot", ...}
        end
    end

    O->>S: Final State
    O-->>SSE: Complete
    SSE-->>C: data: {"event": "complete", ...}
    SSE-->>C: Connection Closed

8.2 Stream Event Categories¶

graph TB
    subgraph "Event Types"
        STATE[State Events]
        LOG[Log Events]
        MEDIA[Media Events]
        CONTROL[Control Events]
    end

    subgraph "State Events"
        STATE_UPDATE[state_update<br/>Full state snapshot]
        NODE_START[node_start<br/>Agent activation]
        NODE_END[node_end<br/>Agent completion]
        TOOL_CALL[tool_call<br/>Tool invocation]
    end

    subgraph "Log Events"
        LOG_DEBUG[log:debug]
        LOG_INFO[log:info]
        LOG_WARN[log:warn]
        LOG_ERROR[log:error]
    end

    subgraph "Media Events"
        SCREENSHOT[screenshot<br/>Base64 PNG]
        VIDEO[video_frame<br/>Recording]
        DOM[dom_snapshot<br/>HTML capture]
    end

    subgraph "Control Events"
        INTERRUPT[interrupt<br/>Awaiting approval]
        RESUME[resume<br/>Continuing]
        COMPLETE[complete<br/>Finished]
        ERROR[error<br/>Fatal error]
    end

    STATE --> STATE_UPDATE
    STATE --> NODE_START
    STATE --> NODE_END
    STATE --> TOOL_CALL

    LOG --> LOG_DEBUG
    LOG --> LOG_INFO
    LOG --> LOG_WARN
    LOG --> LOG_ERROR

    MEDIA --> SCREENSHOT
    MEDIA --> VIDEO
    MEDIA --> DOM

    CONTROL --> INTERRUPT
    CONTROL --> RESUME
    CONTROL --> COMPLETE
    CONTROL --> ERROR

9. Human-in-the-Loop Architecture¶

9.1 Approval Workflow State Machine¶

stateDiagram-v2
    [*] --> Running: Start Execution

    Running --> Interrupted: Hit interrupt_before

    Interrupted --> AwaitingApproval: Checkpoint Saved

    AwaitingApproval --> Approved: POST /approve
    AwaitingApproval --> Rejected: POST /reject
    AwaitingApproval --> Modified: POST /modify

    Approved --> Running: astream(None)
    Modified --> Running: astream(modified_state)
    Rejected --> Skipped: Skip Node

    Skipped --> Running: Continue
    Running --> Completed: All Nodes Done

    Completed --> [*]

9.2 Interrupt Points Configuration¶

graph TB
    subgraph "Settings"
        S1[require_test_plan_approval<br/>Default: false]
        S2[require_healing_approval<br/>Default: false]
        S3[approval_timeout_seconds<br/>Default: 3600]
    end

    subgraph "Interrupt Nodes"
        N1[execute_test<br/>Before running tests]
        N2[self_heal<br/>Before applying fixes]
    end

    subgraph "Approval Actions"
        A1[Approve<br/>Continue as planned]
        A2[Modify<br/>Change state first]
        A3[Reject<br/>Skip this step]
    end

    S1 -->|true| N1
    S2 -->|true| N2

    N1 --> A1
    N1 --> A2
    N1 --> A3

    N2 --> A1
    N2 --> A2
    N2 --> A3

9.3 Approval API Sequence¶

sequenceDiagram
    participant U as User
    participant D as Dashboard
    participant API as API Server
    participant O as Orchestrator
    participant DB as PostgreSQL

    Note over O: Execution hits interrupt_before[self_heal]
    O->>DB: Save checkpoint (interrupted)
    O-->>D: SSE event: interrupt

    D->>U: Show Approval Modal

    U->>D: Review proposed fix
    U->>D: Click Approve

    D->>API: POST /approvals/{thread_id}/approve
    API->>DB: Load checkpoint
    API->>O: Resume with Command.RESUME
    O->>O: Continue from checkpoint
    O-->>D: SSE events resume
    D->>U: Show progress

10. Time Travel & Debugging Architecture¶

10.1 Time Travel Operations¶

graph TB
    subgraph "Operations"
        HISTORY[GET /history/{thread_id}<br/>List all checkpoints]
        STATE[GET /state/{thread_id}/{checkpoint_id}<br/>View state at point]
        REPLAY[POST /replay<br/>Re-execute from checkpoint]
        FORK[POST /fork<br/>Create new thread branch]
        COMPARE[GET /compare<br/>Diff two states]
    end

    subgraph "Checkpoint Chain"
        CP1[CP-001<br/>analyze_code]
        CP2[CP-002<br/>plan_tests]
        CP3[CP-003<br/>execute_test 1]
        CP4[CP-004<br/>execute_test 2]
        CP5[CP-005<br/>self_heal]
        CP6[CP-006<br/>report]
    end

    subgraph "Fork Example"
        FORK_POINT[Fork from CP-003]
        NEW_THREAD[New thread-456]
        ALT_CP4[Alt CP-004]
        ALT_CP5[Alt CP-005]
    end

    CP1 --> CP2 --> CP3 --> CP4 --> CP5 --> CP6

    HISTORY -.-> CP1
    HISTORY -.-> CP2
    HISTORY -.-> CP3

    STATE -.-> CP3

    REPLAY --> CP3

    CP3 --> FORK_POINT
    FORK_POINT --> NEW_THREAD
    NEW_THREAD --> ALT_CP4
    ALT_CP4 --> ALT_CP5

    COMPARE -.-> CP4
    COMPARE -.-> ALT_CP4

10.2 State Comparison Logic¶

flowchart LR
    subgraph "Thread A"
        A_CP[Checkpoint A<br/>passed: 5, failed: 2]
    end

    subgraph "Thread B (Forked)"
        B_CP[Checkpoint B<br/>passed: 6, failed: 1]
    end

    subgraph "Diff Engine"
        COMPARE[Compare States]
        RESULT[Diff Result]
    end

    subgraph "Diff Output"
        ADDED[Added:<br/>healed_tests: 1]
        CHANGED[Changed:<br/>passed: 5→6<br/>failed: 2→1]
        SAME[Unchanged:<br/>iteration, app_url]
    end

    A_CP --> COMPARE
    B_CP --> COMPARE
    COMPARE --> RESULT
    RESULT --> ADDED
    RESULT --> CHANGED
    RESULT --> SAME

11. Memory Store Architecture¶

11.1 Long-Term Memory with pgvector¶

graph TB
    subgraph "Memory Operations"
        STORE[store()<br/>Save to memory]
        GET[get()<br/>Retrieve by key]
        SEARCH[search_similar()<br/>Vector similarity]
        UPDATE[update()<br/>Modify existing]
    end

    subgraph "Memory Namespaces"
        FAILURES[failures<br/>Error patterns]
        FIXES[fixes<br/>Successful repairs]
        SELECTORS[selectors<br/>Element history]
        CONTEXT[context<br/>Test context]
    end

    subgraph "Embedding Pipeline"
        TEXT[Content Text]
        EMBED[OpenAI Embeddings<br/>text-embedding-3-small]
        VECTOR[Vector 1536d]
    end

    subgraph "Storage"
        PG[(PostgreSQL)]
        PGVEC[pgvector<br/>HNSW Index]
    end

    STORE --> TEXT
    TEXT --> EMBED
    EMBED --> VECTOR
    VECTOR --> PGVEC
    PGVEC --> PG

    SEARCH --> EMBED
    EMBED --> PGVEC

    FAILURES --> STORE
    FIXES --> STORE
    SELECTORS --> STORE
    CONTEXT --> STORE

11.2 Semantic Search for Self-Healing¶

sequenceDiagram
    participant SH as Self-Healer Agent
    participant MS as Memory Store
    participant EMB as OpenAI Embeddings
    participant DB as PostgreSQL

    Note over SH: Test failed: "Element #submit-btn not found"

    SH->>MS: search_similar("Element #submit-btn not found", namespace="failures")
    MS->>EMB: Create embedding for query
    EMB-->>MS: Vector [1536 dimensions]

    MS->>DB: SELECT * FROM langgraph_memory_store<br/>WHERE namespace = 'failures'<br/>ORDER BY embedding <=> query_vector<br/>LIMIT 5

    DB-->>MS: 3 similar failures found

    MS-->>SH: [<br/>  {pattern: "button renamed", fix: "use data-testid"},<br/>  {pattern: "dynamic ID", fix: "use aria-label"},<br/>  {pattern: "lazy load", fix: "wait for visible"}<br/>]

    Note over SH: Apply most confident fix

    SH->>SH: Generate new selector using pattern

    Note over SH: Test passes with new selector

    SH->>MS: store("fix", {original: "#submit-btn", healed: "[data-testid='submit']"})
    MS->>EMB: Create embedding
    EMB-->>MS: Vector
    MS->>DB: INSERT INTO langgraph_memory_store
    DB-->>MS: Stored

    SH-->>SH: Return healed test

12. Security Architecture¶

12.1 Security Layers Diagram¶

graph TB
    subgraph "Perimeter"
        WAF[Cloudflare WAF]
        DDOS[DDoS Protection]
        TLS[TLS 1.3]
    end

    subgraph "Authentication"
        SUPABASE_AUTH[Supabase Auth]
        JWT[JWT Tokens]
        API_KEYS[API Keys]
    end

    subgraph "Authorization"
        RBAC[Role-Based Access<br/>admin, member, viewer]
        RLS[Row-Level Security<br/>Supabase Policies]
        SCOPES[API Key Scopes<br/>read, write, admin]
    end

    subgraph "Data Security"
        ENCRYPT[Encryption at Rest<br/>AES-256]
        REDACT[Secret Redaction<br/>Automatic]
        AUDIT[Audit Logging<br/>All Actions]
    end

    subgraph "Runtime Security"
        SANDBOX[Docker Sandbox<br/>Isolated Execution]
        TIMEOUT[Execution Timeout<br/>Max 10 minutes]
        COST[Cost Limits<br/>Per-org budgets]
    end

    WAF --> SUPABASE_AUTH
    DDOS --> SUPABASE_AUTH
    TLS --> SUPABASE_AUTH

    SUPABASE_AUTH --> JWT
    SUPABASE_AUTH --> API_KEYS

    JWT --> RBAC
    API_KEYS --> SCOPES
    RBAC --> RLS

    RLS --> ENCRYPT
    ENCRYPT --> REDACT
    REDACT --> AUDIT

    AUDIT --> SANDBOX
    SANDBOX --> TIMEOUT
    TIMEOUT --> COST

12.2 Authentication Flow¶

sequenceDiagram
    participant U as User
    participant D as Dashboard
    participant AUTH as Supabase Auth
    participant API as API Server
    participant DB as Database

    U->>D: Login (email/password)
    D->>AUTH: signInWithPassword()
    AUTH->>AUTH: Validate credentials
    AUTH-->>D: {access_token, refresh_token}
    D->>D: Store tokens

    U->>D: Start Test Run
    D->>API: POST /stream/test<br/>Authorization: Bearer {token}
    API->>AUTH: Verify JWT
    AUTH-->>API: {user_id, org_id, role}
    API->>DB: Query with RLS
    Note over DB: WHERE org_id = auth.org_id()
    DB-->>API: Filtered data
    API-->>D: SSE Stream

12.3 Row-Level Security Policies¶

-- Organizations: users see only their orgs
CREATE POLICY "org_member_access" ON organizations
    FOR ALL USING (
        id IN (
            SELECT organization_id
            FROM organization_members
            WHERE user_id = auth.uid()
        )
    );

-- Test runs: scoped to org's projects
CREATE POLICY "test_run_org_access" ON test_runs
    FOR ALL USING (
        project_id IN (
            SELECT p.id FROM projects p
            JOIN organization_members om
                ON om.organization_id = p.organization_id
            WHERE om.user_id = auth.uid()
        )
    );

-- Checkpoints: only accessible via thread_id with RLS
CREATE POLICY "checkpoint_access" ON langgraph_checkpoints
    FOR ALL USING (
        thread_id IN (
            SELECT tr.thread_id FROM test_runs tr
            JOIN projects p ON p.id = tr.project_id
            JOIN organization_members om
                ON om.organization_id = p.organization_id
            WHERE om.user_id = auth.uid()
        )
    );

13. Deployment Architecture¶

13.1 Production Deployment Diagram¶

graph TB
    subgraph "Edge Layer"
        CF[Cloudflare CDN<br/>Global Edge]
    end

    subgraph "Load Balancing"
        LB[Application Load Balancer<br/>Health Checks]
    end

    subgraph "Application Tier"
        API1[API Server 1<br/>FastAPI]
        API2[API Server 2<br/>FastAPI]
        API3[API Server N<br/>FastAPI]
    end

    subgraph "Worker Tier"
        W1[Worker 1<br/>Browser + Execution]
        W2[Worker 2<br/>Browser + Execution]
        W3[Worker N<br/>Browser + Execution]
    end

    subgraph "Data Tier"
        PG[(PostgreSQL<br/>Supabase)]
        PGVEC[pgvector<br/>Extension]
        S3[S3/R2<br/>Artifacts]
    end

    subgraph "External APIs"
        ANTHROPIC[Anthropic API<br/>Claude]
        OPENAI[OpenAI API<br/>Embeddings]
    end

    CF --> LB
    LB --> API1
    LB --> API2
    LB --> API3

    API1 --> PG
    API2 --> PG
    API3 --> PG

    W1 --> PG
    W2 --> PG
    W3 --> PG

    W1 --> S3
    W2 --> S3
    W3 --> S3

    PG --> PGVEC

    API1 --> ANTHROPIC
    W1 --> ANTHROPIC
    W1 --> OPENAI

13.2 Container Architecture¶

graph TB
    subgraph "Docker Compose / Kubernetes"
        subgraph "API Service"
            API[api<br/>FastAPI + Uvicorn]
            API_ENV[Environment:<br/>DATABASE_URL<br/>ANTHROPIC_API_KEY]
        end

        subgraph "Worker Service"
            WORKER[worker<br/>Test Executor]
            BROWSER[playwright<br/>Chromium]
        end

        subgraph "Dashboard Service"
            DASH[dashboard<br/>Next.js]
        end

        subgraph "Database"
            PG[postgres:15<br/>+ pgvector]
        end
    end

    API --> PG
    WORKER --> PG
    WORKER --> BROWSER
    DASH --> API

    API --> API_ENV

13.3 Scaling Configuration¶

Component	Scaling Strategy	Trigger
API Servers	Horizontal	CPU > 70%, Latency > 500ms
Workers	Horizontal	Queue depth > 10
Database	Read replicas	Read queries > 1000/s
Memory Store	Partition by namespace	Table size > 10GB

14. Integration Architecture¶

14.1 External Integrations Map¶

graph TB
    subgraph "CI/CD"
        GH_ACTIONS[GitHub Actions]
        GITLAB_CI[GitLab CI]
        JENKINS[Jenkins]
    end

    subgraph "Communication"
        SLACK[Slack]
        TEAMS[MS Teams]
        EMAIL[Email]
    end

    subgraph "Issue Tracking"
        JIRA[Jira]
        LINEAR[Linear]
        GH_ISSUES[GitHub Issues]
    end

    subgraph "Observability"
        SENTRY[Sentry]
        DATADOG[Datadog]
    end

    subgraph "Argus Core"
        WEBHOOK[Webhook Handler]
        NOTIFIER[Notification Service]
        TICKET[Ticket Creator]
    end

    GH_ACTIONS -->|Webhook| WEBHOOK
    GITLAB_CI -->|Webhook| WEBHOOK
    JENKINS -->|Webhook| WEBHOOK

    SENTRY -->|Webhook| WEBHOOK
    DATADOG -->|Webhook| WEBHOOK

    NOTIFIER --> SLACK
    NOTIFIER --> TEAMS
    NOTIFIER --> EMAIL

    TICKET --> JIRA
    TICKET --> LINEAR
    TICKET --> GH_ISSUES

14.2 MCP Server Integration¶

graph TB
    subgraph "MCP Clients"
        VSCODE[VS Code<br/>Extension]
        CLAUDE_DESKTOP[Claude Desktop]
        CURSOR[Cursor IDE]
    end

    subgraph "Argus MCP Server"
        MCP[argus-mcp-server<br/>Cloudflare Worker]
        TOOLS[Tools]
        RESOURCES[Resources]
    end

    subgraph "Available MCP Tools"
        T1[run_test<br/>Execute tests]
        T2[get_results<br/>Fetch results]
        T3[analyze_code<br/>Discover surfaces]
        T4[heal_test<br/>Fix failures]
        T5[get_history<br/>Time travel]
    end

    VSCODE --> MCP
    CLAUDE_DESKTOP --> MCP
    CURSOR --> MCP

    MCP --> TOOLS
    MCP --> RESOURCES

    TOOLS --> T1
    TOOLS --> T2
    TOOLS --> T3
    TOOLS --> T4
    TOOLS --> T5

15. Technology Stack¶

15.1 Complete Stack Table¶

Layer	Technology	Version	Purpose
Frontend	Next.js	14.x	Dashboard UI
Frontend	React	18.x	Component library
Frontend	Tailwind CSS	3.x	Styling
Backend	Python	3.11+	Core runtime
Backend	FastAPI	0.115+	REST API
Backend	Uvicorn	0.32+	ASGI server
Orchestration	LangGraph	1.0.5+	State machine
Orchestration	langgraph-checkpoint-postgres	2.0+	Checkpointing
AI	Anthropic SDK	0.75+	Claude API
AI	OpenAI SDK	1.0+	Embeddings
Browser	Playwright	1.48+	Automation
Database	PostgreSQL	15+	Primary DB
Database	pgvector	0.5+	Vector search
Database	Supabase	-	Managed Postgres
Streaming	sse-starlette	2.0+	SSE support
Validation	Pydantic	2.9+	Data validation

15.2 Python Dependencies¶

[dependencies]
# Core AI
anthropic = ">=0.75.0,<1.0.0"
langgraph = ">=1.0.5,<2.0.0"
langchain-anthropic = ">=1.3.0,<2.0.0"
langchain-core = ">=1.2.5,<2.0.0"

# LangGraph Checkpointing
langgraph-checkpoint = ">=2.0.0"
langgraph-checkpoint-postgres = ">=2.0.0"
psycopg = {extras = ["binary"], version = ">=3.1.0"}

# Web Automation
playwright = ">=1.48.0"
httpx = ">=0.27.0"

# API Server
fastapi = ">=0.115.0"
uvicorn = ">=0.32.0"
sse-starlette = ">=2.0.0"

# Database
supabase = ">=2.0.0"
asyncpg = ">=0.29.0"

# Embeddings
openai = ">=1.0.0"

# Validation
pydantic = ">=2.9.0"
pydantic-settings = ">=2.5.0"

16. Cost Management¶

16.1 AI Model Routing Strategy¶

graph TB
    subgraph "Task Classification"
        TASK[Incoming Task]
        CLASSIFY[Classify Complexity]
    end

    subgraph "Model Selection"
        TRIVIAL[Trivial<br/>→ Haiku 3.5]
        SIMPLE[Simple<br/>→ Haiku 3.5]
        MODERATE[Moderate<br/>→ Sonnet 4.5]
        COMPLEX[Complex<br/>→ Sonnet 4.5]
        EXPERT[Expert<br/>→ Opus 4.5]
    end

    subgraph "Cost per 1K tokens"
        C1[$0.001]
        C2[$0.001]
        C3[$0.003]
        C4[$0.003]
        C5[$0.015]
    end

    TASK --> CLASSIFY
    CLASSIFY --> TRIVIAL --> C1
    CLASSIFY --> SIMPLE --> C2
    CLASSIFY --> MODERATE --> C3
    CLASSIFY --> COMPLEX --> C4
    CLASSIFY --> EXPERT --> C5

16.2 Cost Tracking Schema¶

Table	Purpose
`ai_usage`	Per-request token and cost tracking
`ai_usage_daily`	Daily aggregation by org
`organizations.ai_budget_daily_usd`	Daily spending limit
`organizations.ai_spend_today_usd`	Current day spend

16.3 Estimated Monthly Costs¶

Component	Provider	Cost Range
API Server	Railway/Render	$50-150
Database	Supabase Pro	$25-50
AI (Claude)	Anthropic	$500-2000
AI (Embeddings)	OpenAI	$20-50
Total		$595-2250

17. Version History¶

17.1 Version Timeline¶

gantt
    title Argus Version History
    dateFormat YYYY-MM-DD

    section v1.0.0
    Initial Release           :done, v1, 2025-12-01, 2025-12-15

    section v1.1.0
    Self-Healing Agent        :done, v11, 2025-12-15, 2025-12-20

    section v1.2.0
    Dashboard UI              :done, v12, 2025-12-20, 2025-12-28

    section v1.3.0
    Scheduling & Notifications:done, v13, 2025-12-28, 2026-01-05

    section v2.0.0
    LangGraph 1.0 Features    :done, v20, 2026-01-05, 2026-01-09

17.2 Changelog¶

Version	Date	Git Commit	Changes
2.0.0	2026-01-09	`e22fef1`	LangGraph 1.0 feature suite: PostgresSaver, Memory Store, SSE Streaming, Human-in-the-loop, Multi-agent Supervisor, Time Travel API, Chat Graph
1.3.0	2026-01-05	`bd93905`	Test scheduling, notification channels, parameterized tests
1.2.0	2025-12-28	-	Dashboard with real-time updates, test visualization
1.1.0	2025-12-20	`f8f2bf5`	Self-healing agent, retry logic, error categorization
1.0.0	2025-12-15	-	Initial release: Code analyzer, UI tester, API tester, Reporter

18. Architecture Decision Records¶

ADR Summary Table¶

ADR	Decision	Date	Status	Rationale
ADR-001	Use LangGraph for orchestration	2025-11-15	Accepted	Built-in checkpointing, streaming, interrupts
ADR-002	PostgreSQL for checkpointing	2025-11-20	Accepted	Durable, queryable, time travel support
ADR-003	Hybrid Playwright + Computer Use	2025-11-25	Accepted	Speed of Playwright + intelligence of vision AI
ADR-004	Supabase for auth and database	2025-12-01	Accepted	RLS, realtime, managed Postgres
ADR-005	pgvector for semantic search	2026-01-05	Accepted	Native Postgres, no external vector DB
ADR-006	SSE over WebSocket for streaming	2026-01-07	Accepted	Simpler, HTTP-based, better proxy support
ADR-007	Multi-agent Supervisor pattern	2026-01-08	Accepted	Centralized coordination, easier debugging
ADR-008	OpenAI for embeddings	2026-01-09	Accepted	Better quality than alternatives, reasonable cost

ADR-006: SSE over WebSocket¶

Context: Need real-time streaming for test execution feedback.

Decision: Use Server-Sent Events (SSE) instead of WebSocket.

Rationale: - SSE works over standard HTTP (better proxy/firewall support) - Simpler implementation (no connection upgrade) - Automatic reconnection built-in - Sufficient for server-to-client streaming (our use case) - sse-starlette provides excellent FastAPI integration

Consequences: - Client-to-server messages require separate HTTP requests - Unidirectional only (acceptable for our use case)

Appendix A: Environment Variables Reference¶

# Core AI
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

# Database
DATABASE_URL=postgresql://...
SUPABASE_URL=https://xxx.supabase.co
SUPABASE_ANON_KEY=eyJ...
SUPABASE_SERVICE_KEY=eyJ...

# LangGraph
LANGGRAPH_CHECKPOINT_DB=postgresql://...

# Execution Settings
DEFAULT_MODEL=claude-sonnet-4-5
MAX_ITERATIONS=50
COST_LIMIT_PER_RUN=10.00
SELF_HEAL_ENABLED=true
SELF_HEAL_MAX_RETRIES=3

# Streaming
SSE_KEEPALIVE_INTERVAL=15
STREAM_TIMEOUT=3600

# Human-in-the-Loop
REQUIRE_HEALING_APPROVAL=false
REQUIRE_TEST_PLAN_APPROVAL=false
APPROVAL_TIMEOUT_SECONDS=3600

# Integrations
SLACK_WEBHOOK_URL=https://hooks.slack.com/...
GITHUB_TOKEN=ghp_...

Appendix B: API Quick Reference¶

Streaming Endpoints¶

Method	Endpoint	Description
POST	`/api/v1/stream/test`	Stream test execution (SSE)
POST	`/api/v1/stream/chat`	Stream chat response (SSE)
GET	`/api/v1/stream/status/{thread_id}`	Get stream status
POST	`/api/v1/stream/resume/{thread_id}`	Resume paused stream

Time Travel Endpoints¶

Method	Endpoint	Description
GET	`/api/v1/time-travel/history/{thread_id}`	Get checkpoint history
GET	`/api/v1/time-travel/state/{thread_id}/{checkpoint_id}`	Get state at checkpoint
POST	`/api/v1/time-travel/replay`	Replay from checkpoint
POST	`/api/v1/time-travel/fork`	Fork execution

Approval Endpoints¶

Method	Endpoint	Description
GET	`/api/v1/approvals/pending`	List pending approvals
POST	`/api/v1/approvals/{thread_id}/approve`	Approve and resume
POST	`/api/v1/approvals/{thread_id}/reject`	Reject and skip

Appendix C: Recent Changes (January 2026)¶

v2.8.0 Security & Infrastructure Updates¶

Security Improvements: - Fixed hardcoded JWT secret fallback vulnerability - Replaced permissive RLS policies (USING(true)) with proper organization-scoped policies - Added authentication to all recording endpoints - Fixed development mode auth bypass - Added SSRF protection to URL validation - Fixed CORS wildcard in Cloudflare Worker - Added authentication to /storage/* endpoint - Enabled HTTPS for browser pool URL - Fixed race conditions in pattern_service.py and healing.py - Added missing foreign key constraints - Fixed frontend-backend type mismatches - Changed default passwords in Kubernetes configs - Added comprehensive security headers (CSP, HSTS, X-Frame-Options)

Performance Improvements: - Fixed N+1 queries in project listing (batch query via get_project_test_counts() RPC) - Fixed N+1 queries in organization listing (batch query via get_org_member_counts() RPC) - Fixed N+1 in bulk test operations - Added recording upload size limits (50MB max) - Added request size middleware (100MB max) - Implemented transaction boundaries for invitation acceptance and org creation

Model Router Updates: - Added OpenRouter as primary provider (single API for 300+ models) - Integrated DeepSeek V3.2 (90% cost reduction vs Claude for similar quality) - Integrated DeepSeek R1 (10% cost of o1 for reasoning tasks) - Fixed model key consistency (llama-small instead of llama-3.1-8b) - Added ModelProvider.OPENROUTER handling in _get_router_client()

Infrastructure Updates: - Browser Pool: Updated resource specifications (750m CPU, 1.5Gi memory per pod) - KEDA: Added production scalers for Chrome/Firefox/Edge - Session Cleanup: Added CronJob for stuck session cleanup - AI-Controlled Sessions: Added estimate_session_config() for intelligent timeouts

Architecture Audit Summary (January 17, 2026)¶

Core Architecture: - 7 LangGraph state fields with intelligent reducers - PostgresSaver for durable execution with automatic checkpointing - pgvector-powered memory store for semantic failure search - Supervisor pattern with 23 specialized agents

API Layer: - 40 route modules with 80+ endpoints - 7-layer middleware stack (CORS → Headers → Audit → Rate Limit → Auth → Request Size → Core) - Multi-method authentication (API Key, JWT, Clerk, Service Account) - Fine-grained RBAC with scope-based permission validation

Data Layer: - 40+ tables organized into functional domains - RLS policies with organization membership checks - pgvector HNSW indexes for O(log n) semantic search - Comprehensive foreign key relationships with CASCADE/SET NULL behaviors

AI/ML Integration: - Multi-model routing with 45+ models across 9 providers - Task-based model selection (TRIVIAL → EXPERT tiers) - Budget enforcement per organization (daily/monthly limits) - 60-80% cost savings via intelligent routing

Document generated: 2026-01-17T05:30:00Z Architecture Version: 2.8.0 Git Commit: 918c51a Argus E2E Testing Agent