Hybrid Architecture (Part 3): AI Failure — Degradation and Rollback

AI Fails

AI agents aren't infallible. In our production experience, main failure modes:

Timeout: Agent spends 15+ minutes on a complex page, can't complete
Loop: Agent enters a retry loop — "click → check → wrong → re-click → still wrong"
Hallucination: Agent reports success when nothing happened (Nanobrowser Validator pattern)
Cost explosion: Single task consumes 3x expected token budget

AI failure isn't "if" — it's "when." Degradation strategies must be designed upfront, not improvised during incidents.

Degradation Detection

class AIDegradationDetector:
    def __init__(self):
        self.max_steps = 30
        self.max_time_seconds = 600
        self.max_tokens = 50000
        self.max_retries = 5
 
    def check_health(self, state):
        alerts = []
        if state["step_count"] > self.max_steps:
            alerts.append("step_limit_exceeded")
        if state["elapsed_seconds"] > self.max_time_seconds:
            alerts.append("time_limit_exceeded")
        if state["token_consumed"] > self.max_tokens:
            alerts.append("token_budget_exceeded")
        if state["consecutive_retries"] > self.max_retries:
            alerts.append("retry_loop_detected")
        return alerts

Degradation Levels

Level 1: Retry (with variation)
  Different prompt, add visual input, simplify task description
 
Level 2: Simplify task
  Split current step into smaller sub-tasks, execute individually
 
Level 3: Switch model
  Fast/cheap → stronger model (Gemini Flash → Claude Sonnet)
 
Level 4: Escalate to human
  Mark as "needs human handling," enter wait queue
 
Level 5: Skip / default value
  Non-critical paths: skip with reasonable default

Operation Reversibility

The hardest part of AI degradation isn't "operation failed" — it's "operation executed halfway and is irreversible."

Example: agent submitted a payment form, then discovered the address was wrong.

Irreversible operations need pre-execution confirmation:

class IrreversibleOperationGuard:
    async def execute(self, operation, confirm_fn):
        if not await confirm_fn():
            raise PreconditionError("Pre-execution check failed")
 
        before_state = await self.capture_system_state()
        try:
            result = await operation.execute()
        except Exception as e:
            await self.attempt_rollback(before_state)
            raise
 
        if not await self.verify_operation(result):
            await self.attempt_rollback(before_state)
            raise VerificationError("Post-execution verification failed")
        return result

Operation Type	Rollback Possible	Protection
Form submit	Depends on site	Screenshot before, simulate first
Delete	Hard	Require human confirmation
Payment	No	External confirmation required
Data export	Yes (can delete)	Audit log
Config change	Yes (with backup)	Backup original value

Gradual Rollout

class GradualRollout:
    def __init__(self, traffic_split=0.1):
        self.traffic_split = traffic_split
 
    async def route(self, task):
        if random.random() < self.traffic_split:
            script_result = await self.script_engine.run(task)
            ai_result = await self.ai_agent.run(task)
            await self.record_comparison(task, script_result, ai_result)
            return script_result
        return await self.script_engine.run(task)

Auto-Rollback Conditions

Success rate drops > 5% vs baseline
Average duration > 2x baseline
Cost exceeds alternative
New error types appearing

Summary

Three principles for AI browser automation degradation:

Detect first — objective failure metrics (steps, time, tokens, retries)
Leveled degradation — retry → simplify → switch model → escalate
Operation reversibility — confirm before irreversible, rollback after reversible

The hybrid architecture series (D1-D3) core idea: don't put all eggs in one basket. Scripts and AI each have strengths and weaknesses. Combined via bridge layer and degradation mechanisms, overall reliability far exceeds either approach alone.

Playwright + AI Agent Hybrid Architecture (Part 3): When AI Fails — Degradation, Rollout, and Rollback