AWS Agentic Form Filling: Episodic Memory and Semantic Element Discovery on Bedrock

Enterprise-grade intelligent form filling built on Amazon Bedrock AgentCore and Playwright. Episodic memory injects cross-session experience. Sentence Transformers provide semantic element discovery.

16Yun Engineering TeamJun 2, 20264 min read

Introduction: The Enterprise Requirement

The previous seven articles covered tools for personal automation, cloud concurrency, CLI acceleration, anti-bot, and cognitive orchestration. They share one trait: they are stateless. Every task starts from scratch. The agent doesn't remember what it learned the last time it filled a similar form.

In enterprise scenarios, this is insufficient.

Consider airline check-in flows. Airline A asks for booking reference and last name on the first page. Airline B requires selecting a flight and verifying identity on the same page. Airline C hides the input fields behind a "Start Check-In" button that must be clicked first.

If an AI agent must reason from scratch every time it encounters these variations, neither efficiency nor reliability meet production requirements.

AWS Agentic Form Filling's core innovation solves this: Episodic Memory. The agent remembers "last time I encountered this airline's page, I clicked here first before the input fields appeared" and reuses that experience on the next encounter.

Architecture

┌────────────────────────────────────────────────────┐
│      AWS Agentic Form Filling Architecture          │
│                                                    │
│  ┌──────────────────────────────────────────────┐  │
│  │        Amazon Bedrock AgentCore              │  │
│  │                                              │  │
│  │  ┌────────────────┐  ┌──────────────────┐   │  │
│  │  │ Claude Model   │  │  Episodic Memory │   │  │
│  │  │ (claude-opus)  │  │  Storage         │   │  │
│  │  └────────┬───────┘  └──────────────────┘   │  │
│  │           │                                  │  │
│  │  ┌────────▼──────────────────────────────┐   │  │
│  │  │ ImageFilteringConversationManager     │   │  │
│  │  │ (Sliding window protocol, prunes old  │   │  │
│  │  │  records to stay within context)      │   │  │
│  │  └───────────────────────────────────────┘   │  │
│  └──────────────────────────────────────────────┘  │
│                                                    │
│  ┌──────────────────────────────────────────────┐  │
│  │           Playwright Driver                   │  │
│  │                                              │  │
│  │  ┌────────────────┐  ┌──────────────────┐   │  │
│  │  │ aria_snapshot  │  │ Semantic Element │   │  │
│  │  │ (A11y tree)    │  │ Discovery (BERT) │   │  │
│  │  └────────────────┘  └──────────────────┘   │  │
│  └──────────────────────────────────────────────┘  │
│                                                    │
│  ┌──────────────────────────────────────────────┐  │
│  │        Target Website (airline check-in)      │  │
│  └──────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────┘

Core Components

ComponentRoleTechnology
Bedrock AgentCoreAgent orchestration, memory management, tool callingAWS managed service
Claude modelCore reasoning and decision makingAmazon Bedrock
Episodic MemoryCross-session experience storage and retrievalAgentCore built-in
PlaywrightBrowser control (navigation, interaction, screenshots)Open-source automation
ImageFiltering ManagerContext window management, token optimizationCustom sliding window
Semantic DiscoverySemantic element targetingSentence Transformers

Episodic Memory Design

Episodic memory is the most enterprise-relevant design in AWS Agentic Form Filling. It's different from chat history — it's a structured, searchable, PII-filtered experience database.

Memory Lifecycle

First interaction


AgentCore captures episodic data:
  - Current page URL / title
  - Action trace (navigation, click, fill)
  - Tool efficiency and results per operation
  - Errors encountered and how resolved

PII filter
    │  Removes: passenger names, confirmation codes, payment info
    │  Retains: navigation paths, form structure patterns, error patterns

Stored in episodic memory database
  - Namespace: airline name
  - Contains: successful flows, tool efficiency, error solutions


New session initializes

AgentCore automatically retrieves relevant memories
    │  Matches current task context
    │  "We've handled this airline's check-in page before"

Memory injected into LLM prompt context
    │  "Last time, this form required clicking 'Start Check-In' first"

Agent reuses successful strategy, avoids repeated errors

Key Design Decisions

DecisionRationale
Namespace-based isolationDifferent airlines have wildly different form structures; mixing would confuse reasoning
Automatic PII filteringCompliance (GDPR, CCPA); prevents sensitive data from leaking across sessions
Retain error solutionsFailed patterns are often more informative than successful paths

Semantic Element Discovery

Traditional element targeting relies on exact matching — by ID, CSS selector, XPath, or accessible label. In enterprise forms, these identifiers are often dynamic or unreliable.

AWS's approach: match semantics, not text.

How It Works

Playwright aria_snapshot()
    │  Get accessibility tree

Sentence Transformers (local client model)
    │  Chunk the A11y tree
    │  Convert each chunk to vector embedding

Semantic similarity search
    │  Target: "submit button"
    │  Found: "Proceed to Next Step" (similarity: 0.91)
    │         "Continue" (similarity: 0.87)
    │         "Submit" (similarity: 0.85)

Even if the button says "Proceed to Next Step,"
the agent finds it reliably via semantic proximity

Comparison with Exact Matching

MethodExampleRobustness to Redesign
CSS selector#submit-btnVery poor (class/ID change = break)
XPath//form/div[3]/buttonPoor (DOM change = break)
Accessible labelbutton "Submit"Medium (text change = break)
Semantic vectorsimilar_to("submit button")High (semantics unchanged = works)

Context Window Management: ImageFilteringConversationManager

LLM-powered agents face a dilemma: screenshots provide rich visual context, but each screenshot consumes many tokens.

AWS's solution is a sliding window protocol:

Conversation start:
  [System prompt][Memory injection][Initial screenshot]
 
As conversation progresses:
  [System prompt][Memory injection][Screenshot #1][Action #1][Screenshot #2][Action #2]...
 
Token limit approaching:
  [System prompt][Memory injection][Old screenshot replaced with text placeholder]
  [Screenshot #5][Action #5]...
 
Strategy:
  - Old screenshots dynamically removed
  - Replaced with text placeholders: "a filtered screenshot exists here"
  - Recent actions and screenshots retained
  - Episodic memory unaffected (persistent storage)

This ensures:

  1. Context window stays within model limits
  2. Recent visual information is preserved
  3. Historical key data persists via episodic memory, not active context

Deployment

Core AWS Dependencies

ServicePurpose
Amazon BedrockHosted Claude model
Bedrock AgentCoreAgent orchestration + episodic memory
AWS Lambda / ECSPlaywright execution environment
Amazon S3Screenshot and log storage
IAMPermission management

Simplified Deployment

# 1. Clone project
git clone <repository-url>
 
# 2. Configure AWS credentials
aws configure
 
# 3. Deploy AgentCore
python deploy_agentcore.py \
  --memory-enabled \
  --namespace "airline-checkin"
 
# 4. Enable model access in Bedrock console

Practical Scenario: Airline Check-In

System: Complete airline check-in for passenger
Input: Confirmation code ABC123, Last name SMITH

First attempt (no memory):

1. Open airline check-in page
2. aria_snapshot() for page structure
3. Semantic search: "confirmation code input" → found
4. Fill ABC123
5. Semantic search: "last name input" → found
6. Fill SMITH
7. Semantic search: "check in button" → found and clicked
8. CAPTCHA detected, needs human intervention
9. Memory stored: this airline's check-in has CAPTCHA

Second attempt (with memory):

1. Open same airline check-in page
2. AgentCore retrieves relevant memory
3. Memory injected: "This page requires CAPTCHA verification"
4. Agent prepares CAPTCHA handling strategy
5. Fill info → skip known steps
6. Efficiency: first time 35s, second time 12s

Enterprise vs Open-Source Comparison

DimensionAWS Agentic Form FillingNanobrowser / Browyagent-browser
Setup complexityHigh (multi-AWS service)Very low (extension)Low (CLI)
Memory managementAgentCore Episodic MemoryNonesession save/load
Element targetingSemantic vector + A11yDOM pathA11y Ref
Context optimizationImageFiltering sliding windowNoneNone (A11y inherently compact)
ScalingAWS infrastructure auto-scalingSingle userSingle instance
CostAWS fees + model callsOwn API keyOwn API key
Best forEnterprise productionIndividual devsDevs / small teams

Limitations

  • AWS lock-in: Deeply tied to Amazon Bedrock and AWS ecosystem
  • Complex setup: Requires understanding and configuring multiple AWS services
  • Higher latency: AgentCore orchestration + screenshot analysis + vector search adds overhead
  • Cost uncertainty: Enterprise managed services + Claude model costs need careful estimation

Summary

AWS Agentic Form Filling represents an enterprise-grade paradigm for AI browser agents — it doesn't pursue "fastest" or "cheapest," but "most reliable" and "learnable."

Episodic memory transforms the agent from a stateless tool starting from scratch into a "digital employee" that accumulates experience. Semantic element discovery eliminates dependency on fragile CSS selectors. The sliding window context manager ensures long-running complex tasks won't be interrupted by token limits.

For scenarios requiring processing large volumes of repeated yet subtly different enterprise forms (airline check-in, bank account opening, insurance claims), this architecture provides the most reliable engineering solution currently available.

The next article presents a comprehensive cross-framework comparison of mainstream AI browser agent tools as of mid-2026.

Need an enterprise proxy plan?

We can tailor architecture to your target domains, concurrency, and reliability goals.