agent-browser Snapshot 与截图：让爬虫「看懂」页面

Snapshot：AI 的"眼睛"

传统爬虫靠 CSS 选择器定位元素，AI Agent 需要的是语义理解——页面在做什么，有哪些可操作元素，它们之间的关系是什么。

snapshot 输出页面的 Accessibility Tree（可访问性树），这是浏览器为辅助技术（如屏幕阅读器）生成的语义结构。AI 通过它能直接理解页面：

# 基本 snapshot
agent-browser snapshot

# 输出示例：
# @e1  heading "Product List"
# @e2  link "Item A - $29" url=/product/a
# @e3  link "Item B - $39" url=/product/b
# @e4  button "Load More"
# @e5  textbox "Search products"
# @e6  button "Search"

筛选交互元素

当页面内容很多时，snapshot -i（interactive）只显示可操作的元素：

agent-browser snapshot -i

# @e1 link "Item A - $29"
# @e2 link "Item B - $39"
# @e3 button "Load More"
# @e4 textbox "Search products"
# @e5 button "Search"
# @e6 link "Next Page →"

选项	作用	适用场景
`-i` / `--interactive`	只显示交互元素	AI Agent 决策时，减少干扰
`-c` / `--compact`	移除空结构元素	缩小输出体积
`-d <n>` / `--depth <n>`	限制树深度	页面层级过深时
`-s <sel>` / `--selector <sel>`	限定 CSS 选择器范围	只关注特定区域
`-u` / `--urls`	包含链接 URL	需要提取链接时

组合使用

# 只看 #main 区域内的交互元素，深度限制 5 层
agent-browser snapshot -i -d 5 -s "#main"

# 包含 URL 的紧凑输出
agent-browser snapshot -i -c -u

Annotated Screenshot

--annotate 在截图上叠加编号标签，每个编号对应一个 ref：

agent-browser screenshot --annotate
# 输出：截图保存到 /tmp/screenshot-xxx.png
# [1] @e1 button "Submit"
# [2] @e2 link "Home"
# [3] @e3 textbox "Email"

这让 AI Agent 可以同时使用视觉和文本两种方式理解页面。截图上的编号与 snapshot 中的 ref 完全对应。

截图选项

# 全页截图（包含滚动部分）
agent-browser screenshot --full

# 自定义文件名和目录
agent-browser screenshot --screenshot-dir ./shots

# 输出 JPEG 格式
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80

Diff：页面变化检测

diff 可以比较两次页面状态的变化——适合监控竞品价格、内容更新、页面结构变更。

Snapshot Diff

# 比较当前页面与上一次 snapshot 的差异
agent-browser diff snapshot

# 与保存的 baseline 文件比较
agent-browser diff snapshot --baseline before.txt

# 限定区域
agent-browser diff snapshot --selector "#pricing"

视觉 Diff

# 比较当前页面与 baseline 截图
agent-browser diff screenshot --baseline before.png

# 保存差异图
agent-browser diff screenshot --baseline before.png -o diff.png

# 调整颜色容差
agent-browser diff screenshot --baseline before.png -t 0.2

URL 直接对比

# 比较两个 URL 的页面结构差异
agent-browser diff url https://v1.example.com https://v2.example.com

# 同时做视觉 diff
agent-browser diff url https://v1.example.com https://v2.example.com --screenshot

实际应用场景

场景一：价格监控

# 每天采集产品价格
agent-browser open https://store.example.com/products
agent-browser snapshot -i -c > prices-$(date +%Y%m%d).txt
agent-browser close

# 与前一天的比较
agent-browser diff snapshot --baseline prices-20260629.txt

场景二：页面变更告警

# 首次建立 baseline
agent-browser open https://example.com/pricing
agent-browser screenshot pricing-baseline.png
agent-browser snapshot > pricing-baseline.txt
agent-browser close

# 定期检查变化
agent-browser open https://example.com/pricing
agent-browser diff snapshot --baseline pricing-baseline.txt
# 输出变化：新增/删除/修改的元素列表
agent-browser close

场景三：数据标注

# 用 annotated 截图帮助人工标注
agent-browser open https://example.com/data-table
agent-browser screenshot --annotate --full
# 团队成员可以按编号讨论：@e12 的价格数据有误
agent-browser close

对接代理

export HTTP_PROXY=http://user:pass@proxy.16yun.cn:8888
export HTTPS_PROXY=http://user:pass@proxy.16yun.cn:8888

# 通过代理截图
agent-browser open https://example.com
agent-browser screenshot --full --annotate

用 16YUN 爬虫代理的住宅 IP 可以降低被拦截的概率，确保每次 snapshot 和截图获取到真实的页面内容。

总结

功能	命令	用途
完整页面结构	`snapshot`	AI 理解页面
只看交互元素	`snapshot -i`	Agent 决策
编号截图	`screenshot --annotate`	视觉+文本双通道
结构变化检测	`diff snapshot`	内容监控
视觉变化检测	`diff screenshot`	UI 监控