Chrome Headless at Scale (Part 3): Why K8s HPA Doesn't Work for Browser Clusters
CPU spikes on startup mislead scaling decisions. Memory release lags by minutes. Browsers aren't stateless — you can't kill them mid-task and retry. Kubernetes HPA fails on all three counts.
Why HPA Fails, Reason 1: CPU Startup Spikes
Standard Kubernetes HPA scales based on CPU or memory. Simple logic: average CPU over 80% → scale up. Under 40% → scale down.
This logic breaks for browser instances because startup behavior differs radically from steady state.
Chrome startup:
- Load binary (~200-300MB)
- Initialize V8 engine
- Create GPU context (even in headless mode)
- Establish CDP WebSocket listener
- Load initial page
CPU hits 100% for 1-3 seconds, then drops to 5-15%. If HPA uses CPU:
- Pod starts → CPU 100% → HPA scales up → another Pod starts
- New Pod also starting → CPU 100% → HPA scales up again
- First batch ready → CPU drops to 5% → HPA scales down
- Pods mid-task get killed → failure rate spikes
This is a Startup Storm — HPA misled by the startup CPU spike, scaling up and then back down in rapid succession.
Timeline:
T0: 10 Pods, avg CPU 15%
T1: New tasks arrive, CPU rises to 45%
T2: HPA scales to 12 Pods
T3: New Pods starting, CPU 100% (startup spike)
T4: HPA detects 65% CPU, scales to 15 Pods
T5: All Pods ready, CPU drops to 12%
T6: HPA scales down to 12 Pods
T7: Killed Pods were mid-task → failure spikeWhy HPA Fails, Reason 2: Memory Release Lag
Memory-based HPA also fails. Chrome doesn't release memory back to the OS immediately on close():
-
V8 heap: doesn't shrink on page unload. Marked as available but not returned to the OS because re-allocation costs more.
-
OS page cache: Chrome's shared libraries (libc, zlib) are cached by the OS. They're only marked "reclaimable," not freed, until another process actually requests memory.
-
Swap latency: Pages swapped to disk need to be swapped back before they can be freed. The
freecommand doesn't show increased available memory for minutes.
This creates minutes of HPA lag — Pods are idle but memory metrics remain high. HPA doesn't scale down. Resources are wasted.
Why HPA Fails, Reason 3: Browsers Are Stateful
HPA's core assumption: any Pod can be killed, traffic re-routes to others.
Browser instances carry state. Killing a browser Pod mid-task:
- CDP sessions die immediately
- In-flight page navigations never return
- Cookies and localStorage are lost
- Target sites requiring login need re-authentication
Unlike stateless web services — lose one request, retry on another — browser state is process-level. K8s graceful shutdown has limited effectiveness here: after SIGTERM, Chrome needs time to close all tabs. If terminationGracePeriodSeconds (default 30s) isn't enough, the Pod is force-killed.
Working Scaling Approaches
Approach 1: Custom Metrics HPA
Don't use CPU or memory. Use business metrics:
# Queue-depth-based HPA (recommended)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
metrics:
- type: External
external:
metric:
name: sqs_queue_depth
target:
type: AverageValue
averageValue: 5Or Keda event-driven scaling:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
triggers:
- type: rabbitmq
metadata:
queueName: browser-tasks
queueLength: "10"This avoids all CPU/memory issues. Pod count is determined by pending tasks, not per-Pod resource metrics.
Approach 2: Cooldown Configuration
Prevent startup storms with stabilization windows:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 30Slow but stable scaling that prevents startup storms.
Approach 3: Warm Pool Strategy
Maintain pre-warmed browser instances for latency-sensitive scenarios:
apiVersion: apps/v1
kind: Deployment
metadata:
name: browser-warm-pool
spec:
replicas: 5
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: browser-worker
spec:
# Scales based on workloadWarm pool Pods load Chrome and hold on a blank page. Tasks take from the pool instead of starting from zero.
Approach 4: Graceful Shutdown Configuration
Increase terminationGracePeriodSeconds for Chrome:
apiVersion: v1
kind: Pod
spec:
terminationGracePeriodSeconds: 60
containers:
- name: browser
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "curl -X POST http://localhost:9222/json/close || true"]The preStop hook attempts graceful CDP tab close before K8s sends SIGTERM.
Summary
Kubernetes HPA is nearly useless for browser clusters because:
- CPU startup spikes mislead scaling decisions
- Memory release lags by minutes
- Browsers are stateful — can't kill mid-task
The fix isn't tuning CPU/memory thresholds. Switch to business metrics (queue depth), configure cooldown windows, maintain a warm pool. Treat browsers as stateful services.
A Series Summary: Scaling isn't copying 1 instance 1,000 times. Orphan processes, connection pool exhaustion, HPA failure — each of these hits at different layers. The advice is the same across all three: one browser instance, one task, clean up thoroughly. Simple but effective.
Need an enterprise proxy plan?
We can tailor architecture to your target domains, concurrency, and reliability goals.