Version: 1.0.0 Last Updated: January 2026 Status: Production Ready
Table of Contents
- Executive Summary
- System Architecture Overview
- Technology Stack
- Layer-by-Layer Breakdown
- Data Flow Diagrams
- Database Schema
- API Reference
- Agent System
- External Integrations
- Security Architecture
- Cost Management
- Deployment Architecture
1. Executive Summary
Argus is an autonomous end-to-end testing platform that combines: - AI-Powered Test Generation from production errors - Self-Healing Test Execution with automatic selector repair - Multi-Browser Cross-Device Testing via edge computing - Quality Intelligence correlating errors with test coverage gaps
Key Capabilities
| Capability | Description |
| Autonomous Testing | AI generates, executes, and maintains tests |
| Production Error → Test | Converts Sentry/Datadog errors to regression tests |
| Self-Healing | Automatically fixes broken selectors |
| Multi-Model AI | Routes tasks to optimal model (60-80% cost savings) |
| Edge Browser Automation | Global low-latency browser execution |
| Quality Intelligence | Risk scoring, coverage gaps, predictive insights |
2. System Architecture Overview
┌─────────────────────────────────────────────────────────────────────────────────┐
│ EXTERNAL SOURCES │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Sentry │ │ Datadog │ │FullStory│ │LogRocket│ │ GitHub │ │ Slack │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
└───────┼──────────┼──────────┼──────────┼──────────┼──────────┼─────────────────┘
│ │ │ │ │ │
└──────────┴──────────┴──────────┼──────────┴──────────┘
▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│ CLOUDFLARE EDGE LAYER │
│ (argus-api Worker) │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ Browser Automation Endpoints │ │
│ │ POST /act - Execute single action (click, type, scroll) │ │
│ │ POST /extract - Extract structured data from page │ │
│ │ POST /observe - Discover available actions/elements │ │
│ │ POST /agent - Run autonomous multi-step workflows │ │
│ │ POST /test - Cross-browser multi-device test execution │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Browser │ │ KV Cache │ │ R2 Storage │ │ Vectorize │ │
│ │ Rendering │ │ (sessions, │ │ (screenshots │ │ (semantic │ │
│ │ (Chromium) │ │ dedup) │ │ artifacts) │ │ search) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Queues │ │ Hyperdrive │ │ Workers │ │ Durable │ │
│ │ (async │ │ (DB pool) │ │ AI │ │ Objects │ │
│ │ events) │ │ │ │ (Llama) │ │ (WebSocket) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
└────────────────────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│ RAILWAY BRAIN SERVICE │
│ (Python FastAPI + LangGraph) │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ API Layer (src/api/) │ │
│ │ ├── server.py - Main FastAPI application │ │
│ │ ├── webhooks.py - Platform webhook handlers │ │
│ │ └── quality.py - Quality Intelligence API │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ Orchestration Layer (src/orchestrator/) │ │
│ │ ├── graph.py - LangGraph state machine │ │
│ │ ├── state.py - Shared state definitions │ │
│ │ └── nodes.py - Graph node implementations │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ Agent System (src/agents/) - 20+ Specialized Agents │ │
│ │ ├── code_analyzer.py - Static analysis, test surface discovery │ │
│ │ ├── test_planner.py - Prioritized test plan generation │ │
│ │ ├── ui_tester.py - Browser-based UI test execution │ │
│ │ ├── api_tester.py - REST API testing with schema validation │ │
│ │ ├── self_healer.py - Auto-fix broken tests and selectors │ │
│ │ ├── quality_auditor.py - Quality metrics and coverage analysis │ │
│ │ └── ... (15+ more agents) │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ Core Intelligence (src/core/) │ │
│ │ ├── model_router.py - Multi-model AI routing (cost optimization) │ │
│ │ ├── cognitive_engine.py - Multi-step reasoning │ │
│ │ ├── correlator.py - Error pattern correlation │ │
│ │ ├── normalizer.py - Error message normalization │ │
│ │ ├── coverage.py - Test coverage calculation │ │
│ │ └── risk.py - Risk assessment scoring │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ Services Layer (src/services/) │ │
│ │ ├── supabase_client.py - Database operations │ │
│ │ ├── cache.py - Cloudflare KV caching │ │
│ │ ├── vectorize.py - Semantic search (Cloudflare Vectorize) │ │
│ │ └── ai_cost_tracker.py - Token/cost tracking │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│ SUPABASE (PostgreSQL) │
│ Source of Truth │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ organizations │ │production_events│ │ generated_tests│ │ risk_scores │ │
│ │ (multi-tenant) │ │ (error data) │ │ (AI-generated) │ │ (assessments) │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ └────────────────┘ │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ ci_events │ │coverage_reports│ │ ai_usage │ │healing_patterns│ │
│ │ (CI/CD data) │ │ (test coverage)│ │ (cost tracking)│ │ (self-healing) │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ └────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────┘
3. Technology Stack
3.1 Complete Stack Overview
| Layer | Technology | Purpose |
| Edge | Cloudflare Workers (TypeScript) | Browser automation, low-latency API |
| Backend | Python 3.11 + FastAPI | AI orchestration, business logic |
| Orchestration | LangGraph | Multi-agent state machine |
| Database | Supabase (PostgreSQL) | Persistent storage, RLS |
| Cache | Cloudflare KV | Session, dedup, API cache |
| Vector DB | Cloudflare Vectorize | Semantic error matching |
| Object Storage | Cloudflare R2 | Screenshots, artifacts |
| Queue | Cloudflare Queues | Async event processing |
| AI Primary | Claude (Anthropic) | Sonnet 4.5, Haiku 4.5, Opus 4.5 |
| AI Fallback | OpenAI, Groq, Together | GPT-4o, Llama 3.1 |
| Browser | Cloudflare Browser Rendering | Chromium automation |
| Browser (Paid) | TestingBot | Cross-browser + real devices |
3.2 Python Dependencies
# Core AI
anthropic>=0.75.0 # Claude API client
langgraph>=1.0.5 # Multi-agent orchestration
langchain-anthropic>=1.3.0
langchain-core>=1.2.5
# Web Automation
playwright>=1.48.0 # Browser automation
selenium>=4.25.0 # Fallback browser driver
httpx>=0.27.0 # Async HTTP client
# Data Validation
pydantic>=2.9.0 # Schema validation
pydantic-settings>=2.5.0 # Config management
# Database
sqlalchemy>=2.0.0 # ORM
asyncpg>=0.29.0 # Async PostgreSQL
# API Server
fastapi>=0.115.0 # REST API framework
uvicorn>=0.32.0 # ASGI server
# Utilities
structlog>=24.4.0 # Structured logging
tiktoken>=0.8.0 # Token counting
pillow>=10.4.0 # Image processing
3.3 Cloudflare Services Used
| Service | Binding | Purpose |
| Browser Rendering | BROWSER | Chromium browser instances |
| Workers AI | AI | Llama fallback, embeddings |
| KV | CACHE | Key-value caching |
| R2 | ARTIFACTS | Screenshot/artifact storage |
| Vectorize | VECTOR_INDEX | Vector similarity search |
| Queues | EVENT_QUEUE, DLQ | Async processing |
| Hyperdrive | DB | PostgreSQL connection pooling |
| Durable Objects | REALTIME | WebSocket state |
4. Layer-by-Layer Breakdown
4.1 Edge Layer (Cloudflare Worker)
Location: /cloudflare-worker/src/index.ts
cloudflare-worker/
├── src/
│ ├── index.ts # Main worker entry point
│ ├── utils.ts # Cache/storage helpers
│ └── realtime.ts # WebSocket Durable Object
├── wrangler.toml # Cloudflare configuration
└── package.json
Capabilities: - Browser automation (Playwright on Cloudflare) - Self-healing selectors with fallback generation - AI-powered element discovery - Multi-device/browser testing via TestingBot - Real-time WebSocket updates
Supported Device Presets:
DEVICE_PRESETS = {
// Desktop
"desktop": { width: 1920, height: 1080 },
"desktop-hd": { width: 2560, height: 1440 },
"desktop-mac": { width: 1920, height: 1080 },
// Tablets
"tablet": { width: 768, height: 1024 },
"tablet-landscape": { width: 1024, height: 768 },
// Mobile
"mobile": { width: 375, height: 812 },
"mobile-android": { width: 412, height: 915 },
// Real Devices (TestingBot)
"iphone-15", "iphone-14", "pixel-8", "samsung-s24"
}
4.2 Backend Layer (Python/Railway)
Location: /src/
src/
├── main.py # CLI entry point
├── config.py # Configuration management (150+ settings)
│
├── api/ # FastAPI REST API
│ ├── server.py # Main application, all endpoints
│ ├── webhooks.py # Sentry/Datadog/etc webhook handlers
│ └── quality.py # Quality Intelligence API
│
├── orchestrator/ # LangGraph State Machine
│ ├── graph.py # Graph definition and routing
│ ├── state.py # TestingState TypedDict
│ └── nodes.py # Node implementations
│
├── agents/ # Specialized AI Agents (20+)
│ ├── base.py # BaseAgent abstract class
│ ├── code_analyzer.py # Codebase analysis
│ ├── test_planner.py # Test plan generation
│ ├── ui_tester.py # Browser test execution
│ ├── api_tester.py # API testing
│ ├── self_healer.py # Auto-fix broken tests
│ ├── quality_auditor.py # Quality metrics
│ ├── root_cause_analyzer.py
│ ├── flaky_detector.py
│ ├── nlp_test_creator.py
│ ├── visual_ai.py
│ ├── accessibility_checker.py
│ ├── security_scanner.py
│ ├── performance_analyzer.py
│ └── ...
│
├── core/ # Intelligence Modules
│ ├── model_router.py # Multi-model AI routing
│ ├── cognitive_engine.py # Multi-step reasoning
│ ├── correlator.py # Error correlation
│ ├── normalizer.py # Error normalization
│ ├── coverage.py # Coverage calculation
│ └── risk.py # Risk scoring
│
├── services/ # External Service Clients
│ ├── supabase_client.py # Database operations
│ ├── cache.py # Cloudflare KV client
│ ├── vectorize.py # Semantic search
│ └── ai_cost_tracker.py # Cost management
│
├── browser/ # Browser Automation
│ ├── e2e_client.py # High-level browser client
│ └── stagehand_client.py # Stagehand integration
│
├── security/ # Security & Compliance
│ ├── sanitizer.py # Secret redaction
│ ├── audit.py # Audit logging
│ └── classifier.py # Data classification
│
├── integrations/ # Third-Party Integrations
│ ├── github_integration.py
│ ├── slack_integration.py
│ └── observability_hub.py
│
└── mcp/ # MCP Servers
├── langgraph_mcp.py
├── playwright_mcp.py
└── quality_mcp.py
4.3 Database Layer (Supabase)
Location: /supabase/migrations/
Core Tables:
| Table | Purpose |
organizations | Multi-tenant organization management |
organization_members | User-organization relationships |
projects | Test projects per organization |
production_events | Errors from Sentry/Datadog/etc |
ci_events | CI/CD pipeline events |
coverage_reports | Test coverage data |
generated_tests | AI-generated test code |
risk_scores | Component risk assessments |
healing_patterns | Successful selector fixes |
ai_usage | Per-request AI cost tracking |
ai_usage_daily | Aggregated daily costs |
api_keys | Programmatic API access |
webhook_logs | Incoming webhook audit trail |
5. Data Flow Diagrams
5.1 Test Execution Flow
┌─────────────────────────────────────────────────────────────────────────────────┐
│ TEST EXECUTION WORKFLOW │
└─────────────────────────────────────────────────────────────────────────────────┘
User Request LangGraph Orchestrator
│ │
▼ │
┌─────────┐ │
│ /run │──────────────────────────────┤
│ tests │ │
└─────────┘ │
▼
┌─────────────────┐
│ Create Initial │
│ State │
└────────┬────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Code Analyzer │ │ Test Planner │ │ Executor │
│ │ │ │ │ (UI/API/DB) │
│ • Parse code │ │ • Prioritize │ │ │
│ • Find surfaces│ │ • Create plan │ │ • Run tests │
│ • Extract deps │ │ • Risk-based │ │ • Screenshots │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└────────────────────┼────────────────────┘
│
▼
┌─────────────────┐
│ Test Failed? │
└────────┬────────┘
│
┌──────────────┴──────────────┐
│ YES NO │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Self-Healer │ │ Reporter │
│ │ │ │
│ • Analyze fail │ │ • HTML report │
│ • Fix selector │ │ • GitHub PR │
│ • Retry test │ │ • Slack notify │
└────────┬────────┘ └────────┬────────┘
│ │
└──────────────┬──────────────┘
│
▼
┌─────────────────┐
│ Return Results │
└─────────────────┘
5.2 Quality Intelligence Flow
┌─────────────────────────────────────────────────────────────────────────────────┐
│ QUALITY INTELLIGENCE WORKFLOW │
└─────────────────────────────────────────────────────────────────────────────────┘
Production Error Argus Brain Actions
│ │ │
▼ │ │
┌─────────┐ │ │
│ Sentry │──────┐ │ │
│ Webhook │ │ │ │
└─────────┘ │ │ │
┌─────────┐ │ │ │
│ Datadog │──────┤ │ │
│ Webhook │ │ │ │
└─────────┘ │ │ │
┌─────────┐ │ │ │
│FullStory│──────┼───────────────────▶ ┌─────────────────┐ │
│ Webhook │ │ │ Normalize & │ │
└─────────┘ │ │ Dedupe │ │
┌─────────┐ │ └────────┬────────┘ │
│ Other │──────┘ │ │
│ Sources │ ▼ │
└─────────┘ ┌─────────────────┐ │
│ Store Event + │ │
│ Index Vector │ │
└────────┬────────┘ │
│ │
┌────────────────────────────┼────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Semantic Search │ │ Risk Scoring │ │ Test Generation │
│ │ │ │ │ │
│ Find similar │ │ • Error freq │ │ • Claude AI │
│ past errors │ │ • Severity │ │ • Playwright │
│ (Vectorize) │ │ • User impact │ │ • Auto PR │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└────────────────────────────┼────────────────────────┘
│
▼
┌─────────────────┐
│ Dashboard │
│ + Alerts │
└─────────────────┘
5.3 Multi-Model AI Routing
┌─────────────────────────────────────────────────────────────────────────────────┐
│ AI MODEL ROUTING STRATEGY │
└─────────────────────────────────────────────────────────────────────────────────┘
Incoming Task
│
▼
┌─────────────────┐
│ Classify Task │
│ Type/Complexity │
└────────┬────────┘
│
┌────────┴────────────────────────────────────────────────────┐
│ │
│ Task Complexity │
│ │
│ TRIVIAL ─────────▶ Llama 3.1 8B (Groq) ~$0.0001/1K │
│ (element classify) │
│ │
│ SIMPLE ──────────▶ GPT-4o-mini ~$0.0015/1K │
│ (action extract) OR Claude Haiku │
│ │
│ MODERATE ────────▶ Claude Haiku 4.5 ~$0.004/1K │
│ (test generation) │
│ │
│ COMPLEX ─────────▶ Claude Sonnet 4.5 ~$0.018/1K │
│ (root cause) │
│ │
│ EXPERT ──────────▶ Claude Opus 4.5 ~$0.05/1K │
│ (architecture) │
│ │
└──────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────┐
│ Budget Check │───▶ Over budget? Queue for later
└────────┬────────┘
│
▼
┌─────────────────┐
│ Execute + Track│
│ ai_usage table │
└─────────────────┘
COST SAVINGS: 60-80% vs using Claude Sonnet for everything
6. Database Schema
6.1 Core Tables
-- Organizations (Multi-Tenancy)
CREATE TABLE organizations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
slug TEXT UNIQUE NOT NULL,
plan TEXT DEFAULT 'free', -- free, pro, enterprise
ai_budget_daily_usd NUMERIC DEFAULT 10.00,
ai_budget_monthly_usd NUMERIC DEFAULT 100.00,
ai_spend_today_usd NUMERIC DEFAULT 0,
ai_spend_this_month_usd NUMERIC DEFAULT 0,
features JSONB DEFAULT '{}', -- max_projects, self_healing, etc.
stripe_customer_id TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
-- Production Events (from Sentry/Datadog/etc)
CREATE TABLE production_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID REFERENCES projects(id),
source TEXT NOT NULL, -- sentry, datadog, fullstory, etc.
external_id TEXT,
event_type TEXT, -- error, exception, performance
severity TEXT, -- fatal, error, warning, info
title TEXT NOT NULL,
message TEXT,
stack_trace TEXT,
fingerprint TEXT, -- for deduplication
url TEXT,
component TEXT,
occurrence_count INTEGER DEFAULT 1,
affected_users INTEGER DEFAULT 1,
status TEXT DEFAULT 'new',
ai_analysis JSONB,
created_at TIMESTAMPTZ DEFAULT now()
);
-- Generated Tests (AI-created)
CREATE TABLE generated_tests (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID REFERENCES projects(id),
production_event_id UUID REFERENCES production_events(id),
name TEXT NOT NULL,
test_type TEXT DEFAULT 'e2e',
framework TEXT DEFAULT 'playwright',
test_code TEXT NOT NULL,
test_file_path TEXT,
confidence_score NUMERIC,
status TEXT DEFAULT 'pending', -- pending, approved, rejected
review_notes TEXT,
github_pr_url TEXT,
github_pr_number INTEGER,
created_at TIMESTAMPTZ DEFAULT now()
);
-- Risk Scores (Component Risk Assessment)
CREATE TABLE risk_scores (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID REFERENCES projects(id),
entity_type TEXT NOT NULL, -- page, component, flow, endpoint
entity_identifier TEXT NOT NULL,
overall_risk_score INTEGER, -- 0-100
factors JSONB, -- breakdown of risk factors
error_count INTEGER,
affected_users INTEGER,
trend TEXT DEFAULT 'stable', -- improving, stable, degrading
calculated_at TIMESTAMPTZ DEFAULT now(),
UNIQUE(project_id, entity_type, entity_identifier)
);
-- AI Usage Tracking
CREATE TABLE ai_usage (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID REFERENCES organizations(id),
request_id TEXT UNIQUE NOT NULL,
model TEXT NOT NULL,
provider TEXT NOT NULL,
task_type TEXT,
input_tokens INTEGER NOT NULL,
output_tokens INTEGER NOT NULL,
cost_usd NUMERIC NOT NULL,
latency_ms INTEGER,
cache_hit BOOLEAN DEFAULT false,
created_at TIMESTAMPTZ DEFAULT now()
);
-- Healing Patterns (Self-Healing Knowledge Base)
CREATE TABLE healing_patterns (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
fingerprint TEXT UNIQUE NOT NULL,
original_selector TEXT NOT NULL,
healed_selector TEXT NOT NULL,
error_type TEXT NOT NULL,
success_count INTEGER DEFAULT 1,
failure_count INTEGER DEFAULT 0,
confidence NUMERIC GENERATED ALWAYS AS (
success_count::numeric / GREATEST(success_count + failure_count, 1)
) STORED,
project_id UUID REFERENCES projects(id),
created_at TIMESTAMPTZ DEFAULT now()
);
6.2 Row-Level Security (RLS)
-- Users can only see their organization's data
ALTER TABLE production_events ENABLE ROW LEVEL SECURITY;
CREATE POLICY "org_isolation" ON production_events
FOR ALL USING (
project_id IN (
SELECT p.id FROM projects p
JOIN organization_members om ON om.organization_id = p.organization_id
WHERE om.user_id = auth.uid()
)
);
7. API Reference
7.1 Brain Service Endpoints (Railway)
| Endpoint | Method | Description |
/health | GET | Health check |
/api/v1/tests/run | POST | Start test execution job |
/api/v1/jobs/{id} | GET | Get job status/results |
/api/v1/tests/create | POST | Create test from natural language |
/api/v1/visual/compare | POST | Visual regression comparison |
/api/v1/discover | POST | Auto-discover test scenarios |
/api/v1/webhooks/sentry | POST | Sentry error webhook |
/api/v1/webhooks/datadog | POST | Datadog event webhook |
/api/v1/webhooks/github-actions | POST | GitHub Actions webhook |
/api/v1/quality/generate-test | POST | Generate test from error |
/api/v1/quality/calculate-risk | POST | Calculate risk scores |
/api/v1/quality/similar-errors | GET | Semantic error search |
/api/v1/quality/backfill-index | POST | Index historical errors |
/api/semantic-search | POST | Find similar error patterns |
7.2 Edge Worker Endpoints (Cloudflare)
| Endpoint | Method | Description |
/health | GET | Worker health check |
/act | POST | Execute browser action |
/extract | POST | Extract data from page |
/observe | POST | Discover page elements |
/agent | POST | Multi-step autonomous workflow |
/test | POST | Cross-browser test execution |
8. Agent System
8.1 Agent Hierarchy
BaseAgent (Abstract)
│
├── CodeAnalyzerAgent # Static code analysis
├── TestPlannerAgent # Test prioritization
│
├── ExecutionAgents
│ ├── UITesterAgent # Browser testing
│ ├── APITesterAgent # REST API testing
│ └── DBTesterAgent # Database validation
│
├── IntelligenceAgents
│ ├── SelfHealerAgent # Auto-fix tests
│ ├── RootCauseAnalyzer # Failure analysis
│ ├── QualityAuditorAgent # Quality metrics
│ └── FlakyDetectorAgent # Flakiness detection
│
├── GenerationAgents
│ ├── NLPTestCreator # Natural language → test
│ ├── SessionToTestAgent # Recording → test
│ └── AutoDiscoveryAgent # Crawl → tests
│
└── SpecializedAgents
├── VisualAIAgent # Visual regression
├── AccessibilityAgent # WCAG compliance
├── SecurityScanner # Vulnerability scan
└── PerformanceAnalyzer # Performance metrics
8.2 LangGraph State Machine
# State Definition
class TestingState(TypedDict):
messages: Annotated[Sequence[BaseMessage], add_messages]
codebase_path: str
app_url: str
testable_surfaces: list[dict]
test_plan: list[dict]
test_priorities: dict[str, int]
current_test_index: int
test_results: list[dict]
failures: list[dict]
healing_queue: list[dict]
healing_attempts: int
healed_tests: list[dict]
total_input_tokens: int
total_output_tokens: int
total_cost: float
iteration: int
security_summary: dict
next_agent: str
should_continue: bool
# Graph Flow
analyze → plan → execute → (heal if failed) → report
9. External Integrations
| Platform | Webhook Endpoint | Events Captured |
| Sentry | /api/v1/webhooks/sentry | Errors, exceptions, issues |
| Datadog | /api/v1/webhooks/datadog | Alerts, errors, metrics |
| FullStory | /api/v1/webhooks/fullstory | Rage clicks, dead clicks |
| LogRocket | /api/v1/webhooks/logrocket | Frontend errors |
| NewRelic | /api/v1/webhooks/newrelic | APM alerts |
| Bugsnag | /api/v1/webhooks/bugsnag | Error tracking |
| Rollbar | /api/v1/webhooks/rollbar | Error tracking |
9.2 CI/CD Integration
| Platform | Integration Type |
| GitHub Actions | Webhook + PR comments + check runs |
| GitLab CI | Webhook support |
| CircleCI | Webhook support |
9.3 AI Providers
| Provider | Models | Use Case |
| Anthropic | Claude Opus/Sonnet/Haiku 4.5 | Primary AI |
| OpenAI | GPT-4o, GPT-4o-mini | Fallback |
| Groq | Llama 3.1 8B/70B | Fast inference |
| Together | Llama, Mixtral | Cost optimization |
| Google | Gemini 1.5 Flash/Pro | Alternative |
| Workers AI | Llama, BGE embeddings | Edge AI |
10. Security Architecture
10.1 Security Features
| Feature | Implementation |
| Secret Redaction | Automatic in sanitizer.py |
| RLS | Supabase row-level security |
| API Keys | Hashed storage, scoped permissions |
| Audit Trail | All actions logged with context |
| JWT Auth | Supabase Auth integration |
| Webhook Signatures | HMAC verification per platform |
| Budget Controls | Daily/monthly AI spend limits |
10.2 Data Classification
# Patterns automatically redacted:
- API keys (sk-*, ghp_*, etc.)
- Passwords in URLs
- Bearer tokens
- AWS credentials
- Database connection strings
- Private keys (PEM format)
11. Cost Management
11.1 AI Cost Tracking
Per Request:
ai_usage table → input_tokens, output_tokens, cost_usd
Daily Aggregation:
ai_usage_daily → total cost per org per day
Budget Enforcement:
organizations.ai_budget_daily_usd
organizations.ai_spend_today_usd
→ Requests rejected when budget exceeded
11.2 Estimated Monthly Costs (at scale)
| Component | Provider | Cost/Month |
| Brain Service | Railway | $150-300 |
| Database | Supabase Scale | $150 |
| Edge Workers | Cloudflare Pro | $25 |
| KV + R2 + Vectorize | Cloudflare | $50 |
| AI (Claude) | Anthropic | $1,500-3,000 |
| Total | | ~$2,000-3,500 |
12. Deployment Architecture
12.1 Production Setup
┌─────────────────────────────────────────────────────────────────────────────────┐
│ PRODUCTION DEPLOYMENT │
└─────────────────────────────────────────────────────────────────────────────────┘
Internet Traffic
│
▼
┌─────────────────┐
│ Cloudflare │ ◄─── Global CDN + DDoS protection
│ Edge Network │
└────────┬────────┘
│
┌────────┴────────────────────────────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ argus-api │ │ argus-brain │
│ (CF Worker) │ │ (Railway) │
│ │ │ │
│ Browser auto │◄───────────────────│ LangGraph │
│ at edge │ /agent calls │ Orchestration │
│ │ │ │
│ Global: ~50ms │ │ Region: US-East │
└────────┬────────┘ └────────┬────────┘
│ │
│ │
└──────────────┬───────────────────────┘
│
▼
┌─────────────────┐
│ Supabase │
│ (PostgreSQL) │
│ │
│ AWS us-east-1 │
│ Pooler enabled │
└─────────────────┘
12.2 Environment Variables
# Brain Service (Railway)
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
SUPABASE_URL=https://xxx.supabase.co
SUPABASE_SERVICE_KEY=eyJ...
# Cloudflare
CLOUDFLARE_API_TOKEN=xxx
CLOUDFLARE_ACCOUNT_ID=xxx
CLOUDFLARE_KV_NAMESPACE_ID=xxx
CLOUDFLARE_VECTORIZE_INDEX=argus-patterns
# Optional
GITHUB_TOKEN=ghp_...
SLACK_WEBHOOK_URL=https://hooks.slack.com/...
TESTINGBOT_KEY=xxx
TESTINGBOT_SECRET=xxx
Summary
Argus is a production-ready autonomous testing platform featuring:
- 3-Tier Architecture: Edge (Cloudflare) → Brain (Railway) → Database (Supabase)
- 20+ AI Agents: Specialized for different testing and analysis tasks
- Multi-Model AI: 60-80% cost savings through intelligent routing
- Self-Healing: Automatic test maintenance and selector repair
- Quality Intelligence: Production error → test coverage gap analysis
- Enterprise Features: Multi-tenancy, RBAC, audit trails, budget controls
Current Status: Production-ready with semantic search, auto-indexing, and multi-model support deployed.