AWS Agentic Form Filling: Episodic Memory and Semantic Element Discovery on Bedrock

Introduction: The Enterprise Requirement

The previous seven articles covered tools for personal automation, cloud concurrency, CLI acceleration, anti-bot, and cognitive orchestration. They share one trait: they are stateless. Every task starts from scratch. The agent doesn't remember what it learned the last time it filled a similar form.

In enterprise scenarios, this is insufficient.

Consider airline check-in flows. Airline A asks for booking reference and last name on the first page. Airline B requires selecting a flight and verifying identity on the same page. Airline C hides the input fields behind a "Start Check-In" button that must be clicked first.

If an AI agent must reason from scratch every time it encounters these variations, neither efficiency nor reliability meet production requirements.

AWS Agentic Form Filling's core innovation solves this: Episodic Memory. The agent remembers "last time I encountered this airline's page, I clicked here first before the input fields appeared" and reuses that experience on the next encounter.

Architecture

┌────────────────────────────────────────────────────┐
│      AWS Agentic Form Filling Architecture          │
│                                                    │
│  ┌──────────────────────────────────────────────┐  │
│  │        Amazon Bedrock AgentCore              │  │
│  │                                              │  │
│  │  ┌────────────────┐  ┌──────────────────┐   │  │
│  │  │ Claude Model   │  │  Episodic Memory │   │  │
│  │  │ (claude-opus)  │  │  Storage         │   │  │
│  │  └────────┬───────┘  └──────────────────┘   │  │
│  │           │                                  │  │
│  │  ┌────────▼──────────────────────────────┐   │  │
│  │  │ ImageFilteringConversationManager     │   │  │
│  │  │ (Sliding window protocol, prunes old  │   │  │
│  │  │  records to stay within context)      │   │  │
│  │  └───────────────────────────────────────┘   │  │
│  └──────────────────────────────────────────────┘  │
│                                                    │
│  ┌──────────────────────────────────────────────┐  │
│  │           Playwright Driver                   │  │
│  │                                              │  │
│  │  ┌────────────────┐  ┌──────────────────┐   │  │
│  │  │ aria_snapshot  │  │ Semantic Element │   │  │
│  │  │ (A11y tree)    │  │ Discovery (BERT) │   │  │
│  │  └────────────────┘  └──────────────────┘   │  │
│  └──────────────────────────────────────────────┘  │
│                                                    │
│  ┌──────────────────────────────────────────────┐  │
│  │        Target Website (airline check-in)      │  │
│  └──────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────┘

Core Components

Component	Role	Technology
Bedrock AgentCore	Agent orchestration, memory management, tool calling	AWS managed service
Claude model	Core reasoning and decision making	Amazon Bedrock
Episodic Memory	Cross-session experience storage and retrieval	AgentCore built-in
Playwright	Browser control (navigation, interaction, screenshots)	Open-source automation
ImageFiltering Manager	Context window management, token optimization	Custom sliding window
Semantic Discovery	Semantic element targeting	Sentence Transformers

Episodic Memory Design

Episodic memory is the most enterprise-relevant design in AWS Agentic Form Filling. It's different from chat history — it's a structured, searchable, PII-filtered experience database.

Memory Lifecycle

First interaction
    │
    ▼
AgentCore captures episodic data:
  - Current page URL / title
  - Action trace (navigation, click, fill)
  - Tool efficiency and results per operation
  - Errors encountered and how resolved
    │
PII filter
    │  Removes: passenger names, confirmation codes, payment info
    │  Retains: navigation paths, form structure patterns, error patterns
    ▼
Stored in episodic memory database
  - Namespace: airline name
  - Contains: successful flows, tool efficiency, error solutions
    │
    ▼
New session initializes
    │
AgentCore automatically retrieves relevant memories
    │  Matches current task context
    │  "We've handled this airline's check-in page before"
    ▼
Memory injected into LLM prompt context
    │  "Last time, this form required clicking 'Start Check-In' first"
    ▼
Agent reuses successful strategy, avoids repeated errors

Key Design Decisions

Decision	Rationale
Namespace-based isolation	Different airlines have wildly different form structures; mixing would confuse reasoning
Automatic PII filtering	Compliance (GDPR, CCPA); prevents sensitive data from leaking across sessions
Retain error solutions	Failed patterns are often more informative than successful paths

Semantic Element Discovery

Traditional element targeting relies on exact matching — by ID, CSS selector, XPath, or accessible label. In enterprise forms, these identifiers are often dynamic or unreliable.

AWS's approach: match semantics, not text.

How It Works

Playwright aria_snapshot()
    │  Get accessibility tree
    ▼
Sentence Transformers (local client model)
    │  Chunk the A11y tree
    │  Convert each chunk to vector embedding
    ▼
Semantic similarity search
    │  Target: "submit button"
    │  Found: "Proceed to Next Step" (similarity: 0.91)
    │         "Continue" (similarity: 0.87)
    │         "Submit" (similarity: 0.85)
    ▼
Even if the button says "Proceed to Next Step,"
the agent finds it reliably via semantic proximity

Comparison with Exact Matching

Method	Example	Robustness to Redesign
CSS selector	`#submit-btn`	Very poor (class/ID change = break)
XPath	`//form/div[3]/button`	Poor (DOM change = break)
Accessible label	`button "Submit"`	Medium (text change = break)
Semantic vector	`similar_to("submit button")`	High (semantics unchanged = works)

Context Window Management: ImageFilteringConversationManager

LLM-powered agents face a dilemma: screenshots provide rich visual context, but each screenshot consumes many tokens.

AWS's solution is a sliding window protocol:

Conversation start:
  [System prompt][Memory injection][Initial screenshot]
 
As conversation progresses:
  [System prompt][Memory injection][Screenshot #1][Action #1][Screenshot #2][Action #2]...
 
Token limit approaching:
  [System prompt][Memory injection][Old screenshot replaced with text placeholder]
  [Screenshot #5][Action #5]...
 
Strategy:
  - Old screenshots dynamically removed
  - Replaced with text placeholders: "a filtered screenshot exists here"
  - Recent actions and screenshots retained
  - Episodic memory unaffected (persistent storage)

This ensures:

Context window stays within model limits
Recent visual information is preserved
Historical key data persists via episodic memory, not active context

Deployment

Core AWS Dependencies

Service	Purpose
Amazon Bedrock	Hosted Claude model
Bedrock AgentCore	Agent orchestration + episodic memory
AWS Lambda / ECS	Playwright execution environment
Amazon S3	Screenshot and log storage
IAM	Permission management

Simplified Deployment

# 1. Clone project
git clone <repository-url>
 
# 2. Configure AWS credentials
aws configure
 
# 3. Deploy AgentCore
python deploy_agentcore.py \
  --memory-enabled \
  --namespace "airline-checkin"
 
# 4. Enable model access in Bedrock console

Practical Scenario: Airline Check-In

System: Complete airline check-in for passenger
Input: Confirmation code ABC123, Last name SMITH

First attempt (no memory):

1. Open airline check-in page
2. aria_snapshot() for page structure
3. Semantic search: "confirmation code input" → found
4. Fill ABC123
5. Semantic search: "last name input" → found
6. Fill SMITH
7. Semantic search: "check in button" → found and clicked
8. CAPTCHA detected, needs human intervention
9. Memory stored: this airline's check-in has CAPTCHA

Second attempt (with memory):

1. Open same airline check-in page
2. AgentCore retrieves relevant memory
3. Memory injected: "This page requires CAPTCHA verification"
4. Agent prepares CAPTCHA handling strategy
5. Fill info → skip known steps
6. Efficiency: first time 35s, second time 12s

Enterprise vs Open-Source Comparison

Dimension	AWS Agentic Form Filling	Nanobrowser / Browy	agent-browser
Setup complexity	High (multi-AWS service)	Very low (extension)	Low (CLI)
Memory management	AgentCore Episodic Memory	None	session save/load
Element targeting	Semantic vector + A11y	DOM path	A11y Ref
Context optimization	ImageFiltering sliding window	None	None (A11y inherently compact)
Scaling	AWS infrastructure auto-scaling	Single user	Single instance
Cost	AWS fees + model calls	Own API key	Own API key
Best for	Enterprise production	Individual devs	Devs / small teams

Limitations

AWS lock-in: Deeply tied to Amazon Bedrock and AWS ecosystem
Complex setup: Requires understanding and configuring multiple AWS services
Higher latency: AgentCore orchestration + screenshot analysis + vector search adds overhead
Cost uncertainty: Enterprise managed services + Claude model costs need careful estimation

Summary

AWS Agentic Form Filling represents an enterprise-grade paradigm for AI browser agents — it doesn't pursue "fastest" or "cheapest," but "most reliable" and "learnable."

Episodic memory transforms the agent from a stateless tool starting from scratch into a "digital employee" that accumulates experience. Semantic element discovery eliminates dependency on fragile CSS selectors. The sliding window context manager ensures long-running complex tasks won't be interrupted by token limits.

For scenarios requiring processing large volumes of repeated yet subtly different enterprise forms (airline check-in, bank account opening, insurance claims), this architecture provides the most reliable engineering solution currently available.

The next article presents a comprehensive cross-framework comparison of mainstream AI browser agent tools as of mid-2026.