Playwright + AI Agent Hybrid Architecture (Part 3): When AI Fails — Degradation, Rollout, and Rollback

AI agents aren't infallible. When AI can't complete a task, the system needs degradation signals, gradual rollout, operational reversibility, and automatic rollback.

16Yun Engineering TeamApr 18, 20262 min read

AI Fails

AI agents aren't infallible. In our production experience, main failure modes:

  1. Timeout: Agent spends 15+ minutes on a complex page, can't complete
  2. Loop: Agent enters a retry loop — "click → check → wrong → re-click → still wrong"
  3. Hallucination: Agent reports success when nothing happened (Nanobrowser Validator pattern)
  4. Cost explosion: Single task consumes 3x expected token budget

AI failure isn't "if" — it's "when." Degradation strategies must be designed upfront, not improvised during incidents.

Degradation Detection

class AIDegradationDetector:
    def __init__(self):
        self.max_steps = 30
        self.max_time_seconds = 600
        self.max_tokens = 50000
        self.max_retries = 5
 
    def check_health(self, state):
        alerts = []
        if state["step_count"] > self.max_steps:
            alerts.append("step_limit_exceeded")
        if state["elapsed_seconds"] > self.max_time_seconds:
            alerts.append("time_limit_exceeded")
        if state["token_consumed"] > self.max_tokens:
            alerts.append("token_budget_exceeded")
        if state["consecutive_retries"] > self.max_retries:
            alerts.append("retry_loop_detected")
        return alerts

Degradation Levels

Level 1: Retry (with variation)
  Different prompt, add visual input, simplify task description
 
Level 2: Simplify task
  Split current step into smaller sub-tasks, execute individually
 
Level 3: Switch model
  Fast/cheap → stronger model (Gemini Flash → Claude Sonnet)
 
Level 4: Escalate to human
  Mark as "needs human handling," enter wait queue
 
Level 5: Skip / default value
  Non-critical paths: skip with reasonable default

Operation Reversibility

The hardest part of AI degradation isn't "operation failed" — it's "operation executed halfway and is irreversible."

Example: agent submitted a payment form, then discovered the address was wrong.

Irreversible operations need pre-execution confirmation:

class IrreversibleOperationGuard:
    async def execute(self, operation, confirm_fn):
        if not await confirm_fn():
            raise PreconditionError("Pre-execution check failed")
 
        before_state = await self.capture_system_state()
        try:
            result = await operation.execute()
        except Exception as e:
            await self.attempt_rollback(before_state)
            raise
 
        if not await self.verify_operation(result):
            await self.attempt_rollback(before_state)
            raise VerificationError("Post-execution verification failed")
        return result
Operation TypeRollback PossibleProtection
Form submitDepends on siteScreenshot before, simulate first
DeleteHardRequire human confirmation
PaymentNoExternal confirmation required
Data exportYes (can delete)Audit log
Config changeYes (with backup)Backup original value

Gradual Rollout

class GradualRollout:
    def __init__(self, traffic_split=0.1):
        self.traffic_split = traffic_split
 
    async def route(self, task):
        if random.random() < self.traffic_split:
            script_result = await self.script_engine.run(task)
            ai_result = await self.ai_agent.run(task)
            await self.record_comparison(task, script_result, ai_result)
            return script_result
        return await self.script_engine.run(task)

Auto-Rollback Conditions

  1. Success rate drops > 5% vs baseline
  2. Average duration > 2x baseline
  3. Cost exceeds alternative
  4. New error types appearing

Summary

Three principles for AI browser automation degradation:

  1. Detect first — objective failure metrics (steps, time, tokens, retries)
  2. Leveled degradation — retry → simplify → switch model → escalate
  3. Operation reversibility — confirm before irreversible, rollback after reversible

The hybrid architecture series (D1-D3) core idea: don't put all eggs in one basket. Scripts and AI each have strengths and weaknesses. Combined via bridge layer and degradation mechanisms, overall reliability far exceeds either approach alone.

Need an enterprise proxy plan?

We can tailor architecture to your target domains, concurrency, and reliability goals.