Browser Automation Cost Breakdown (Part 1): Tokens, Proxy, Compute

A Typical Task Cost Breakdown

10,000 pages/day, 3-5 fields per page. Costs come from three dimensions:

Dimension 1: LLM Inference Cost

Model	Input Price	Output Price	Per Step (5K in + 200 out)	10-Step Task
Claude Haiku 3.5	$1.00/M	$5.00/M	~$0.006	$0.06
Gemini 2.5 Flash	$0.15/M	$0.60/M	~$0.0009	$0.009
GPT-4o	$2.50/M	$10.00/M	~$0.0145	$0.145
Claude Sonnet 4	$3.00/M	$15.00/M	~$0.018	$0.18

Daily cost for 10,000 tasks: $90 (Gemini Flash) to $1,800 (Claude Sonnet).

With A11y tree optimization, token consumption drops 90%: $9 to $180.

Dimension 2: Proxy Traffic Cost

Proxy Type	Unit Price	10K Requests/Day	Monthly
Datacenter	$0.04/GB	~$0.10	~$3
Crawler (tunnel)	Monthly	—	$50-$200
API Proxy	$0.50/GB	~$1.50	~$45
Dedicated	Fixed IP	—	$10-$50

Proxy is usually 5-15% of total cost. Usually not worth optimizing.

Dimension 3: Compute Cost

Setup	Spec	Monthly	Capacity
Single Docker	4C/16G	~$50	Thousands/day
Small K8s	8C/32G × 3	~$300	Tens of thousands
Cloud API	Per session	$0.001-0.01/session	Elastic

Compute is 10-20% of total.

Total Cost Structure

LLM inference: 70-85%  → optimize here
Proxy traffic: 5-15%   → not worth much effort
Compute: 10-20%        → optimize after LLM

Five Effective Optimizations

1. DOM Cache

class DOMCache:
    def __init__(self, ttl_seconds=300):
        self.cache = {}
        self.ttl = ttl_seconds
 
    async def get_dom(self, url, page):
        if url in self.cache:
            entry = self.cache[url]
            if time.time() - entry["time"] < self.ttl:
                return entry["dom"]
        dom = await extract_dom(page)
        self.cache[url] = {"dom": dom, "time": time.time()}
        return dom

2. Reduce Screenshot Frequency

Screenshots are expensive. Base64-encoded at ~500KB, they consume many tokens in vision models. Only capture for debugging and critical steps.

3. Reuse Sessions

agent-browser auth save ./session.json
agent-browser auth login ./session.json
agent-browser open https://example.16yun.cn

4. Choose the Right Model

Don't use the strongest model for everything. Simple tasks → Gemini Flash or Haiku. Complex → Sonnet or GPT-4o.

5. Don't Over-Proxy

If the target site doesn't have anti-bot, datacenter proxy is sufficient. Only upgrade when detected.

What Not to Optimize

Optimization	ROI	Reason
Compress network requests	Minimal	Proxy cost is tiny
Lower screenshot resolution	Limited	Vision model token usage varies
Self-host servers	Shifts cost	Bandwidth, maintenance, stability
Fine-tune custom model	Unclear	Cost exceeds using existing models

Summary

LLM inference is 70-85% of total cost. Optimization priorities:

Token consumption (A11y tree, DOM cache, fewer screenshots)
Model selection (match model to task complexity)
Session reuse (reduce login token waste)
Proxy and compute (last priority)

AI Browser Automation Cost Analysis (Part 1): Tokens, Proxy, and Compute