Prompt Injection 防御（二）：行为约束沙盒

输入过滤不够

第一篇提到的四层防御中，输入过滤和上下文隔离能挡住大多数攻击。但问题是：攻击者可以不断尝试新的变体。总有一种能绕过正则表达式。

这时需要第二道防线：行为沙盒——即使智能体被操控了，限制它能做的事情。

只读模式

对于不需要修改数据的任务（信息提取、内容抓取），直接锁定为只读模式：

class ReadOnlySandbox:
    """只读模式——禁止任何数据修改操作"""
 
    async def execute(self, page, task):
        # 禁止的 CDP 方法
        blocked = [
            "Input.dispatchKeyEvent",    # 键盘输入
            "Input.dispatchMouseEvent",  # 鼠标点击
            "Page.navigate",             # 导航（只读也应允许）
            "Network.setCookie",         # 修改 Cookie
            "DOM.setFileInputFiles",     # 文件上传
        ]
 
        # 可能需要的操作（但只读模式下应该禁止）
        dangerous_js_patterns = [
            r"document\.forms\[",
            r"\.submit\(\)",
            r"fetch\(.*\{.*method:.*(POST|PUT|DELETE)",
            r"XMLHttpRequest\.prototype\.open.*POST",
        ]
 
        # 监控所有 CDP 调用
        async with self.cdp_monitor(page, blocked_funcs=blocked):
            return await self.extract_content(page, task)

只读模式的使用场景：竞争对手监控、内容聚合、市场情报分析。这些任务只需要读取页面内容，不需要修改任何数据。

域白名单

限制智能体只能操作指定域名范围内的页面：

class DomainWhitelist:
    def __init__(self):
        self.allowed_domains = set()
        self.blocked_domains = set()
        self.mode = "deny_all"  # "allow_list" 或 "deny_list"
 
    def allow(self, *domains):
        """添加允许的域名"""
        self.allowed_domains.update(domains)
        self.mode = "allow_list"
 
    def deny(self, *domains):
        """添加禁止的域名"""
        self.blocked_domains.update(domains)
        self.mode = "deny_list"
 
    def check(self, url):
        from urllib.parse import urlparse
        domain = urlparse(url).netloc
 
        if self.mode == "allow_list":
            if not any(d in domain for d in self.allowed_domains):
                raise DomainBlockedError(f"{domain} is not in allowlist")
 
        elif self.mode == "deny_list":
            if any(d in domain for d in self.blocked_domains):
                raise DomainBlockedError(f"{domain} is blocked")
 
        return True

操作降权

不同敏感度的操作需要不同级别的确认。Browy 的默认禁用策略是值得参考的设计：

class OperationPrivilegeLevel:
    """操作权限等级"""
    TRUSTED = "trusted"        # 完全信任
    NORMAL = "normal"          # 正常操作，不需要确认
    SENSITIVE = "sensitive"    # 敏感操作，需要确认
    CRITICAL = "critical"      # 关键操作，需要二次确认
 
class PrivilegeManager:
    def __init__(self):
        self.operations = {
            "page.navigate": PrivilegeLevel.NORMAL,
            "input.type": PrivilegeLevel.NORMAL,
            "click": PrivilegeLevel.NORMAL,
            "form.submit": PrivilegeLevel.SENSITIVE,
            "download.file": PrivilegeLevel.SENSITIVE,
            "delete.record": PrivilegeLevel.CRITICAL,
            "payment.execute": PrivilegeLevel.CRITICAL,
        }
 
    async def check(self, operation, context):
        level = self.operations.get(operation, PrivilegeLevel.SENSITIVE)
 
        if level == PrivilegeLevel.TRUSTED:
            return True
 
        if level == PrivilegeLevel.CRITICAL:
            # 关键操作：截图 + 人工确认 + 操作日志
            screenshot = await context.page.screenshot()
            await log_critical_operation(operation, screenshot)
            confirmed = await request_human_confirmation(
                operation, screenshot
            )
            if not confirmed:
                raise OperationBlockedError(operation)
            return True
 
        if level == PrivilegeLevel.SENSITIVE:
            # 敏感操作：截图日志，不需要人工确认
            await log_sensitive_operation(operation, context)
            return True
 
        return True

组合使用

四种防御机制不是独立的，应该组合使用：

加载图表中...

Browy 的默认禁用策略参考

Browy 的安全模型提供了一个很好的参考：默认所有宿主 OS 工具禁用，用户在设置中逐一手动开启。同样的原则可以应用在 AI 浏览器智能体上：

默认域白名单为空——智能体一开始不能访问任何网站
默认写操作为敏感——第一次写操作时询问用户确认
默认文件访问关闭——智能体不能读写本地文件
默认命令执行关闭——智能体不能执行 Shell 命令

太严格了会破坏用户体验，但默认宽松了安全无法保障。找到一个平衡点需要根据具体场景来决定。

Nanobrowser 安全防御（二）：行为约束沙盒与权限控制