Browser Automation at 1,000 Sessions (Part 3): Why K8s HPA Fails

Why HPA Fails, Reason 1: CPU Startup Spikes

Standard Kubernetes HPA scales based on CPU or memory. Simple logic: average CPU over 80% → scale up. Under 40% → scale down.

This logic breaks for browser instances because startup behavior differs radically from steady state.

Chrome startup:

Load binary (~200-300MB)
Initialize V8 engine
Create GPU context (even in headless mode)
Establish CDP WebSocket listener
Load initial page

CPU hits 100% for 1-3 seconds, then drops to 5-15%. If HPA uses CPU:

Pod starts → CPU 100% → HPA scales up → another Pod starts
New Pod also starting → CPU 100% → HPA scales up again
First batch ready → CPU drops to 5% → HPA scales down
Pods mid-task get killed → failure rate spikes

This is a Startup Storm — HPA misled by the startup CPU spike, scaling up and then back down in rapid succession.

Timeline:
  T0: 10 Pods, avg CPU 15%
  T1: New tasks arrive, CPU rises to 45%
  T2: HPA scales to 12 Pods
  T3: New Pods starting, CPU 100% (startup spike)
  T4: HPA detects 65% CPU, scales to 15 Pods
  T5: All Pods ready, CPU drops to 12%
  T6: HPA scales down to 12 Pods
  T7: Killed Pods were mid-task → failure spike

Why HPA Fails, Reason 2: Memory Release Lag

Memory-based HPA also fails. Chrome doesn't release memory back to the OS immediately on close():

V8 heap: doesn't shrink on page unload. Marked as available but not returned to the OS because re-allocation costs more.
OS page cache: Chrome's shared libraries (libc, zlib) are cached by the OS. They're only marked "reclaimable," not freed, until another process actually requests memory.
Swap latency: Pages swapped to disk need to be swapped back before they can be freed. The free command doesn't show increased available memory for minutes.

This creates minutes of HPA lag — Pods are idle but memory metrics remain high. HPA doesn't scale down. Resources are wasted.

Why HPA Fails, Reason 3: Browsers Are Stateful

HPA's core assumption: any Pod can be killed, traffic re-routes to others.

Browser instances carry state. Killing a browser Pod mid-task:

CDP sessions die immediately
In-flight page navigations never return
Cookies and localStorage are lost
Target sites requiring login need re-authentication

Unlike stateless web services — lose one request, retry on another — browser state is process-level. K8s graceful shutdown has limited effectiveness here: after SIGTERM, Chrome needs time to close all tabs. If terminationGracePeriodSeconds (default 30s) isn't enough, the Pod is force-killed.

Working Scaling Approaches

Approach 1: Custom Metrics HPA

Don't use CPU or memory. Use business metrics:

# Queue-depth-based HPA (recommended)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  metrics:
  - type: External
    external:
      metric:
        name: sqs_queue_depth
      target:
        type: AverageValue
        averageValue: 5

Or Keda event-driven scaling:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
  triggers:
  - type: rabbitmq
    metadata:
      queueName: browser-tasks
      queueLength: "10"

This avoids all CPU/memory issues. Pod count is determined by pending tasks, not per-Pod resource metrics.

Approach 2: Cooldown Configuration

Prevent startup storms with stabilization windows:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
      value: 10
      periodSeconds: 60
  scaleUp:
    stabilizationWindowSeconds: 60
    policies:
    - type: Pods
      value: 2
      periodSeconds: 30

Slow but stable scaling that prevents startup storms.

Approach 3: Warm Pool Strategy

Maintain pre-warmed browser instances for latency-sensitive scenarios:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: browser-warm-pool
spec:
  replicas: 5
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: browser-worker
spec:
  # Scales based on workload

Warm pool Pods load Chrome and hold on a blank page. Tasks take from the pool instead of starting from zero.

Approach 4: Graceful Shutdown Configuration

Increase terminationGracePeriodSeconds for Chrome:

apiVersion: v1
kind: Pod
spec:
  terminationGracePeriodSeconds: 60
  containers:
  - name: browser
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "curl -X POST http://localhost:9222/json/close || true"]

The preStop hook attempts graceful CDP tab close before K8s sends SIGTERM.

Summary

Kubernetes HPA is nearly useless for browser clusters because:

CPU startup spikes mislead scaling decisions
Memory release lags by minutes
Browsers are stateful — can't kill mid-task

The fix isn't tuning CPU/memory thresholds. Switch to business metrics (queue depth), configure cooldown windows, maintain a warm pool. Treat browsers as stateful services.

A Series Summary: Scaling isn't copying 1 instance 1,000 times. Orphan processes, connection pool exhaustion, HPA failure — each of these hits at different layers. The advice is the same across all three: one browser instance, one task, clean up thoroughly. Simple but effective.

Chrome Headless at Scale (Part 3): Why K8s HPA Doesn't Work for Browser Clusters