Chrome Headless at Scale (Part 3): Why K8s HPA Doesn't Work for Browser Clusters

CPU spikes on startup mislead scaling decisions. Memory release lags by minutes. Browsers aren't stateless — you can't kill them mid-task and retry. Kubernetes HPA fails on all three counts.

16Yun Engineering TeamApr 4, 20263 min read

Why HPA Fails, Reason 1: CPU Startup Spikes

Standard Kubernetes HPA scales based on CPU or memory. Simple logic: average CPU over 80% → scale up. Under 40% → scale down.

This logic breaks for browser instances because startup behavior differs radically from steady state.

Chrome startup:

  1. Load binary (~200-300MB)
  2. Initialize V8 engine
  3. Create GPU context (even in headless mode)
  4. Establish CDP WebSocket listener
  5. Load initial page

CPU hits 100% for 1-3 seconds, then drops to 5-15%. If HPA uses CPU:

  • Pod starts → CPU 100% → HPA scales up → another Pod starts
  • New Pod also starting → CPU 100% → HPA scales up again
  • First batch ready → CPU drops to 5% → HPA scales down
  • Pods mid-task get killed → failure rate spikes

This is a Startup Storm — HPA misled by the startup CPU spike, scaling up and then back down in rapid succession.

Timeline:
  T0: 10 Pods, avg CPU 15%
  T1: New tasks arrive, CPU rises to 45%
  T2: HPA scales to 12 Pods
  T3: New Pods starting, CPU 100% (startup spike)
  T4: HPA detects 65% CPU, scales to 15 Pods
  T5: All Pods ready, CPU drops to 12%
  T6: HPA scales down to 12 Pods
  T7: Killed Pods were mid-task → failure spike

Why HPA Fails, Reason 2: Memory Release Lag

Memory-based HPA also fails. Chrome doesn't release memory back to the OS immediately on close():

  1. V8 heap: doesn't shrink on page unload. Marked as available but not returned to the OS because re-allocation costs more.

  2. OS page cache: Chrome's shared libraries (libc, zlib) are cached by the OS. They're only marked "reclaimable," not freed, until another process actually requests memory.

  3. Swap latency: Pages swapped to disk need to be swapped back before they can be freed. The free command doesn't show increased available memory for minutes.

This creates minutes of HPA lag — Pods are idle but memory metrics remain high. HPA doesn't scale down. Resources are wasted.

Why HPA Fails, Reason 3: Browsers Are Stateful

HPA's core assumption: any Pod can be killed, traffic re-routes to others.

Browser instances carry state. Killing a browser Pod mid-task:

  • CDP sessions die immediately
  • In-flight page navigations never return
  • Cookies and localStorage are lost
  • Target sites requiring login need re-authentication

Unlike stateless web services — lose one request, retry on another — browser state is process-level. K8s graceful shutdown has limited effectiveness here: after SIGTERM, Chrome needs time to close all tabs. If terminationGracePeriodSeconds (default 30s) isn't enough, the Pod is force-killed.

Working Scaling Approaches

Approach 1: Custom Metrics HPA

Don't use CPU or memory. Use business metrics:

# Queue-depth-based HPA (recommended)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  metrics:
  - type: External
    external:
      metric:
        name: sqs_queue_depth
      target:
        type: AverageValue
        averageValue: 5

Or Keda event-driven scaling:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
  triggers:
  - type: rabbitmq
    metadata:
      queueName: browser-tasks
      queueLength: "10"

This avoids all CPU/memory issues. Pod count is determined by pending tasks, not per-Pod resource metrics.

Approach 2: Cooldown Configuration

Prevent startup storms with stabilization windows:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
      value: 10
      periodSeconds: 60
  scaleUp:
    stabilizationWindowSeconds: 60
    policies:
    - type: Pods
      value: 2
      periodSeconds: 30

Slow but stable scaling that prevents startup storms.

Approach 3: Warm Pool Strategy

Maintain pre-warmed browser instances for latency-sensitive scenarios:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: browser-warm-pool
spec:
  replicas: 5
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: browser-worker
spec:
  # Scales based on workload

Warm pool Pods load Chrome and hold on a blank page. Tasks take from the pool instead of starting from zero.

Approach 4: Graceful Shutdown Configuration

Increase terminationGracePeriodSeconds for Chrome:

apiVersion: v1
kind: Pod
spec:
  terminationGracePeriodSeconds: 60
  containers:
  - name: browser
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "curl -X POST http://localhost:9222/json/close || true"]

The preStop hook attempts graceful CDP tab close before K8s sends SIGTERM.

Summary

Kubernetes HPA is nearly useless for browser clusters because:

  1. CPU startup spikes mislead scaling decisions
  2. Memory release lags by minutes
  3. Browsers are stateful — can't kill mid-task

The fix isn't tuning CPU/memory thresholds. Switch to business metrics (queue depth), configure cooldown windows, maintain a warm pool. Treat browsers as stateful services.

A Series Summary: Scaling isn't copying 1 instance 1,000 times. Orphan processes, connection pool exhaustion, HPA failure — each of these hits at different layers. The advice is the same across all three: one browser instance, one task, clean up thoroughly. Simple but effective.

Need an enterprise proxy plan?

We can tailor architecture to your target domains, concurrency, and reliability goals.