AI Browser Automation Cost Analysis (Part 1): Tokens, Proxy, and Compute

A practical cost analysis. How much goes to LLM tokens, proxy traffic, and cloud servers. Which optimizations work and which aren't worth the effort.

16Yun Engineering TeamApr 30, 20262 min read

A Typical Task Cost Breakdown

10,000 pages/day, 3-5 fields per page. Costs come from three dimensions:

Dimension 1: LLM Inference Cost

ModelInput PriceOutput PricePer Step (5K in + 200 out)10-Step Task
Claude Haiku 3.5$1.00/M$5.00/M~$0.006$0.06
Gemini 2.5 Flash$0.15/M$0.60/M~$0.0009$0.009
GPT-4o$2.50/M$10.00/M~$0.0145$0.145
Claude Sonnet 4$3.00/M$15.00/M~$0.018$0.18

Daily cost for 10,000 tasks: $90 (Gemini Flash) to $1,800 (Claude Sonnet).

With A11y tree optimization, token consumption drops 90%: $9 to $180.

Dimension 2: Proxy Traffic Cost

Proxy TypeUnit Price10K Requests/DayMonthly
Datacenter$0.04/GB~$0.10~$3
Crawler (tunnel)Monthly$50-$200
API Proxy$0.50/GB~$1.50~$45
DedicatedFixed IP$10-$50

Proxy is usually 5-15% of total cost. Usually not worth optimizing.

Dimension 3: Compute Cost

SetupSpecMonthlyCapacity
Single Docker4C/16G~$50Thousands/day
Small K8s8C/32G × 3~$300Tens of thousands
Cloud APIPer session$0.001-0.01/sessionElastic

Compute is 10-20% of total.

Total Cost Structure

LLM inference: 70-85%  → optimize here
Proxy traffic: 5-15%   → not worth much effort
Compute: 10-20%        → optimize after LLM

Five Effective Optimizations

1. DOM Cache

class DOMCache:
    def __init__(self, ttl_seconds=300):
        self.cache = {}
        self.ttl = ttl_seconds
 
    async def get_dom(self, url, page):
        if url in self.cache:
            entry = self.cache[url]
            if time.time() - entry["time"] < self.ttl:
                return entry["dom"]
        dom = await extract_dom(page)
        self.cache[url] = {"dom": dom, "time": time.time()}
        return dom

2. Reduce Screenshot Frequency

Screenshots are expensive. Base64-encoded at ~500KB, they consume many tokens in vision models. Only capture for debugging and critical steps.

3. Reuse Sessions

agent-browser auth save ./session.json
agent-browser auth login ./session.json
agent-browser open https://example.16yun.cn

4. Choose the Right Model

Don't use the strongest model for everything. Simple tasks → Gemini Flash or Haiku. Complex → Sonnet or GPT-4o.

5. Don't Over-Proxy

If the target site doesn't have anti-bot, datacenter proxy is sufficient. Only upgrade when detected.

What Not to Optimize

OptimizationROIReason
Compress network requestsMinimalProxy cost is tiny
Lower screenshot resolutionLimitedVision model token usage varies
Self-host serversShifts costBandwidth, maintenance, stability
Fine-tune custom modelUnclearCost exceeds using existing models

Summary

LLM inference is 70-85% of total cost. Optimization priorities:

  1. Token consumption (A11y tree, DOM cache, fewer screenshots)
  2. Model selection (match model to task complexity)
  3. Session reuse (reduce login token waste)
  4. Proxy and compute (last priority)

Need an enterprise proxy plan?

We can tailor architecture to your target domains, concurrency, and reliability goals.