AI Browser Automation Cost Analysis (Part 1): Tokens, Proxy, and Compute
A practical cost analysis. How much goes to LLM tokens, proxy traffic, and cloud servers. Which optimizations work and which aren't worth the effort.
A Typical Task Cost Breakdown
10,000 pages/day, 3-5 fields per page. Costs come from three dimensions:
Dimension 1: LLM Inference Cost
| Model | Input Price | Output Price | Per Step (5K in + 200 out) | 10-Step Task |
|---|---|---|---|---|
| Claude Haiku 3.5 | $1.00/M | $5.00/M | ~$0.006 | $0.06 |
| Gemini 2.5 Flash | $0.15/M | $0.60/M | ~$0.0009 | $0.009 |
| GPT-4o | $2.50/M | $10.00/M | ~$0.0145 | $0.145 |
| Claude Sonnet 4 | $3.00/M | $15.00/M | ~$0.018 | $0.18 |
Daily cost for 10,000 tasks: $90 (Gemini Flash) to $1,800 (Claude Sonnet).
With A11y tree optimization, token consumption drops 90%: $9 to $180.
Dimension 2: Proxy Traffic Cost
| Proxy Type | Unit Price | 10K Requests/Day | Monthly |
|---|---|---|---|
| Datacenter | $0.04/GB | ~$0.10 | ~$3 |
| Crawler (tunnel) | Monthly | — | $50-$200 |
| API Proxy | $0.50/GB | ~$1.50 | ~$45 |
| Dedicated | Fixed IP | — | $10-$50 |
Proxy is usually 5-15% of total cost. Usually not worth optimizing.
Dimension 3: Compute Cost
| Setup | Spec | Monthly | Capacity |
|---|---|---|---|
| Single Docker | 4C/16G | ~$50 | Thousands/day |
| Small K8s | 8C/32G × 3 | ~$300 | Tens of thousands |
| Cloud API | Per session | $0.001-0.01/session | Elastic |
Compute is 10-20% of total.
Total Cost Structure
LLM inference: 70-85% → optimize here
Proxy traffic: 5-15% → not worth much effort
Compute: 10-20% → optimize after LLMFive Effective Optimizations
1. DOM Cache
class DOMCache:
def __init__(self, ttl_seconds=300):
self.cache = {}
self.ttl = ttl_seconds
async def get_dom(self, url, page):
if url in self.cache:
entry = self.cache[url]
if time.time() - entry["time"] < self.ttl:
return entry["dom"]
dom = await extract_dom(page)
self.cache[url] = {"dom": dom, "time": time.time()}
return dom2. Reduce Screenshot Frequency
Screenshots are expensive. Base64-encoded at ~500KB, they consume many tokens in vision models. Only capture for debugging and critical steps.
3. Reuse Sessions
agent-browser auth save ./session.json
agent-browser auth login ./session.json
agent-browser open https://example.16yun.cn4. Choose the Right Model
Don't use the strongest model for everything. Simple tasks → Gemini Flash or Haiku. Complex → Sonnet or GPT-4o.
5. Don't Over-Proxy
If the target site doesn't have anti-bot, datacenter proxy is sufficient. Only upgrade when detected.
What Not to Optimize
| Optimization | ROI | Reason |
|---|---|---|
| Compress network requests | Minimal | Proxy cost is tiny |
| Lower screenshot resolution | Limited | Vision model token usage varies |
| Self-host servers | Shifts cost | Bandwidth, maintenance, stability |
| Fine-tune custom model | Unclear | Cost exceeds using existing models |
Summary
LLM inference is 70-85% of total cost. Optimization priorities:
- Token consumption (A11y tree, DOM cache, fewer screenshots)
- Model selection (match model to task complexity)
- Session reuse (reduce login token waste)
- Proxy and compute (last priority)
Need an enterprise proxy plan?
We can tailor architecture to your target domains, concurrency, and reliability goals.