Prompt Injection Deep Defense (Part 1): Context Isolation

This Is Not Theoretical

Prompt injection has moved from academic proof-of-concept to real-world attacks. Invisible text on clothing, malicious instructions in PDF metadata, hidden prompts in HTML comments — any invisible page text can be an attack vector.

Nanobrowser's Guardrails module checks for known patterns:

{
  pattern: /\b(ignore|forget|disregard)[\s\-_]*(previous|all|above)[\s\-_]*(instructions?|tasks?|commands?)\b/gi,
  type: ThreatType.TASK_OVERRIDE,
}

But relying on regex for security is like using a screen door as a bulletproof vest.

Layer 1: Context Isolation

Page content should never share the same prompt context as system instructions.

Unsafe (flat context):
  [system prompt][user task][page DOM with hidden attack text]
  → LLM sees everything, may be manipulated
 
Safe (layered context):
  [system prompt: you are a data extraction tool]
  [user task]
  --- separator ---
  [page DOM (sandboxed)]
  → LLM knows system and page content are different layers

Layer 2: Content Filtering

Filter known attack patterns before passing content to the LLM:

{
  pattern: /\b(ignore|forget|disregard)\b.*\b(instructions|tasks|commands)\b/gi,
  type: "task_override",
},
{
  pattern: /\b(system|new)\s+(prompt|instruction|task|goal)\b/gi,
  type: "prompt_injection",
},

But regex only catches known patterns. Paraphrased variants bypass them easily.

Layer 3: Sensitive Information Isolation

Don't pass all page content to the LLM. Only what's needed:

class ContentFilter:
    def should_block_page(self, url):
        blocked_domains = ["mail.google.com", "bank", "account"]
        return any(d in url for d in blocked_domains)
 
    def filter_for_llm(self, dom_content, task_type):
        if task_type == "extract_prices":
            return self.filter_prices(dom_content)
        elif task_type == "fill_form":
            return self.filter_forms(dom_content)

Layer 4: Operation Confirmation Gate

For high-risk operations, require external confirmation:

class OperationGate:
    HIGH_RISK_OPERATIONS = [
        "transfer", "password_reset", "delete_account",
        "payment_submit", "email_send",
    ]
 
    async def check(self, operation_type, page_context):
        if operation_type in self.HIGH_RISK_OPERATIONS:
            screenshot = await page_context.screenshot()
            await notify_human(f"High risk op: {operation_type}")
            confirmed = await wait_for_confirmation(timeout=60)
            if not confirmed:
                raise OperationBlockedError(operation_type)
        return True

Summary

Four-layer defense:

Context isolation — separate page content from system prompts
Content filtering — filter known attack patterns
Sensitive info isolation — pass only what's needed
Operation gate — require external confirmation for high-risk ops

Nanobrowser Security (Part 1): Prompt Injection Context Isolation and Input Filtering