Playwright + AI Agent Hybrid Architecture (Part 3): When AI Fails — Degradation, Rollout, and Rollback
AI agents aren't infallible. When AI can't complete a task, the system needs degradation signals, gradual rollout, operational reversibility, and automatic rollback.
AI Fails
AI agents aren't infallible. In our production experience, main failure modes:
- Timeout: Agent spends 15+ minutes on a complex page, can't complete
- Loop: Agent enters a retry loop — "click → check → wrong → re-click → still wrong"
- Hallucination: Agent reports success when nothing happened (Nanobrowser Validator pattern)
- Cost explosion: Single task consumes 3x expected token budget
AI failure isn't "if" — it's "when." Degradation strategies must be designed upfront, not improvised during incidents.
Degradation Detection
class AIDegradationDetector:
def __init__(self):
self.max_steps = 30
self.max_time_seconds = 600
self.max_tokens = 50000
self.max_retries = 5
def check_health(self, state):
alerts = []
if state["step_count"] > self.max_steps:
alerts.append("step_limit_exceeded")
if state["elapsed_seconds"] > self.max_time_seconds:
alerts.append("time_limit_exceeded")
if state["token_consumed"] > self.max_tokens:
alerts.append("token_budget_exceeded")
if state["consecutive_retries"] > self.max_retries:
alerts.append("retry_loop_detected")
return alertsDegradation Levels
Level 1: Retry (with variation)
Different prompt, add visual input, simplify task description
Level 2: Simplify task
Split current step into smaller sub-tasks, execute individually
Level 3: Switch model
Fast/cheap → stronger model (Gemini Flash → Claude Sonnet)
Level 4: Escalate to human
Mark as "needs human handling," enter wait queue
Level 5: Skip / default value
Non-critical paths: skip with reasonable defaultOperation Reversibility
The hardest part of AI degradation isn't "operation failed" — it's "operation executed halfway and is irreversible."
Example: agent submitted a payment form, then discovered the address was wrong.
Irreversible operations need pre-execution confirmation:
class IrreversibleOperationGuard:
async def execute(self, operation, confirm_fn):
if not await confirm_fn():
raise PreconditionError("Pre-execution check failed")
before_state = await self.capture_system_state()
try:
result = await operation.execute()
except Exception as e:
await self.attempt_rollback(before_state)
raise
if not await self.verify_operation(result):
await self.attempt_rollback(before_state)
raise VerificationError("Post-execution verification failed")
return result| Operation Type | Rollback Possible | Protection |
|---|---|---|
| Form submit | Depends on site | Screenshot before, simulate first |
| Delete | Hard | Require human confirmation |
| Payment | No | External confirmation required |
| Data export | Yes (can delete) | Audit log |
| Config change | Yes (with backup) | Backup original value |
Gradual Rollout
class GradualRollout:
def __init__(self, traffic_split=0.1):
self.traffic_split = traffic_split
async def route(self, task):
if random.random() < self.traffic_split:
script_result = await self.script_engine.run(task)
ai_result = await self.ai_agent.run(task)
await self.record_comparison(task, script_result, ai_result)
return script_result
return await self.script_engine.run(task)Auto-Rollback Conditions
- Success rate drops > 5% vs baseline
- Average duration > 2x baseline
- Cost exceeds alternative
- New error types appearing
Summary
Three principles for AI browser automation degradation:
- Detect first — objective failure metrics (steps, time, tokens, retries)
- Leveled degradation — retry → simplify → switch model → escalate
- Operation reversibility — confirm before irreversible, rollback after reversible
The hybrid architecture series (D1-D3) core idea: don't put all eggs in one basket. Scripts and AI each have strengths and weaknesses. Combined via bridge layer and degradation mechanisms, overall reliability far exceeds either approach alone.
Need an enterprise proxy plan?
We can tailor architecture to your target domains, concurrency, and reliability goals.