AWS Agentic Form Filling: Episodic Memory and Semantic Element Discovery on Bedrock
Enterprise-grade intelligent form filling built on Amazon Bedrock AgentCore and Playwright. Episodic memory injects cross-session experience. Sentence Transformers provide semantic element discovery.
Introduction: The Enterprise Requirement
The previous seven articles covered tools for personal automation, cloud concurrency, CLI acceleration, anti-bot, and cognitive orchestration. They share one trait: they are stateless. Every task starts from scratch. The agent doesn't remember what it learned the last time it filled a similar form.
In enterprise scenarios, this is insufficient.
Consider airline check-in flows. Airline A asks for booking reference and last name on the first page. Airline B requires selecting a flight and verifying identity on the same page. Airline C hides the input fields behind a "Start Check-In" button that must be clicked first.
If an AI agent must reason from scratch every time it encounters these variations, neither efficiency nor reliability meet production requirements.
AWS Agentic Form Filling's core innovation solves this: Episodic Memory. The agent remembers "last time I encountered this airline's page, I clicked here first before the input fields appeared" and reuses that experience on the next encounter.
Architecture
┌────────────────────────────────────────────────────┐
│ AWS Agentic Form Filling Architecture │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Amazon Bedrock AgentCore │ │
│ │ │ │
│ │ ┌────────────────┐ ┌──────────────────┐ │ │
│ │ │ Claude Model │ │ Episodic Memory │ │ │
│ │ │ (claude-opus) │ │ Storage │ │ │
│ │ └────────┬───────┘ └──────────────────┘ │ │
│ │ │ │ │
│ │ ┌────────▼──────────────────────────────┐ │ │
│ │ │ ImageFilteringConversationManager │ │ │
│ │ │ (Sliding window protocol, prunes old │ │ │
│ │ │ records to stay within context) │ │ │
│ │ └───────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Playwright Driver │ │
│ │ │ │
│ │ ┌────────────────┐ ┌──────────────────┐ │ │
│ │ │ aria_snapshot │ │ Semantic Element │ │ │
│ │ │ (A11y tree) │ │ Discovery (BERT) │ │ │
│ │ └────────────────┘ └──────────────────┘ │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Target Website (airline check-in) │ │
│ └──────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────┘Core Components
| Component | Role | Technology |
|---|---|---|
| Bedrock AgentCore | Agent orchestration, memory management, tool calling | AWS managed service |
| Claude model | Core reasoning and decision making | Amazon Bedrock |
| Episodic Memory | Cross-session experience storage and retrieval | AgentCore built-in |
| Playwright | Browser control (navigation, interaction, screenshots) | Open-source automation |
| ImageFiltering Manager | Context window management, token optimization | Custom sliding window |
| Semantic Discovery | Semantic element targeting | Sentence Transformers |
Episodic Memory Design
Episodic memory is the most enterprise-relevant design in AWS Agentic Form Filling. It's different from chat history — it's a structured, searchable, PII-filtered experience database.
Memory Lifecycle
First interaction
│
▼
AgentCore captures episodic data:
- Current page URL / title
- Action trace (navigation, click, fill)
- Tool efficiency and results per operation
- Errors encountered and how resolved
│
PII filter
│ Removes: passenger names, confirmation codes, payment info
│ Retains: navigation paths, form structure patterns, error patterns
▼
Stored in episodic memory database
- Namespace: airline name
- Contains: successful flows, tool efficiency, error solutions
│
▼
New session initializes
│
AgentCore automatically retrieves relevant memories
│ Matches current task context
│ "We've handled this airline's check-in page before"
▼
Memory injected into LLM prompt context
│ "Last time, this form required clicking 'Start Check-In' first"
▼
Agent reuses successful strategy, avoids repeated errorsKey Design Decisions
| Decision | Rationale |
|---|---|
| Namespace-based isolation | Different airlines have wildly different form structures; mixing would confuse reasoning |
| Automatic PII filtering | Compliance (GDPR, CCPA); prevents sensitive data from leaking across sessions |
| Retain error solutions | Failed patterns are often more informative than successful paths |
Semantic Element Discovery
Traditional element targeting relies on exact matching — by ID, CSS selector, XPath, or accessible label. In enterprise forms, these identifiers are often dynamic or unreliable.
AWS's approach: match semantics, not text.
How It Works
Playwright aria_snapshot()
│ Get accessibility tree
▼
Sentence Transformers (local client model)
│ Chunk the A11y tree
│ Convert each chunk to vector embedding
▼
Semantic similarity search
│ Target: "submit button"
│ Found: "Proceed to Next Step" (similarity: 0.91)
│ "Continue" (similarity: 0.87)
│ "Submit" (similarity: 0.85)
▼
Even if the button says "Proceed to Next Step,"
the agent finds it reliably via semantic proximityComparison with Exact Matching
| Method | Example | Robustness to Redesign |
|---|---|---|
| CSS selector | #submit-btn | Very poor (class/ID change = break) |
| XPath | //form/div[3]/button | Poor (DOM change = break) |
| Accessible label | button "Submit" | Medium (text change = break) |
| Semantic vector | similar_to("submit button") | High (semantics unchanged = works) |
Context Window Management: ImageFilteringConversationManager
LLM-powered agents face a dilemma: screenshots provide rich visual context, but each screenshot consumes many tokens.
AWS's solution is a sliding window protocol:
Conversation start:
[System prompt][Memory injection][Initial screenshot]
As conversation progresses:
[System prompt][Memory injection][Screenshot #1][Action #1][Screenshot #2][Action #2]...
Token limit approaching:
[System prompt][Memory injection][Old screenshot replaced with text placeholder]
[Screenshot #5][Action #5]...
Strategy:
- Old screenshots dynamically removed
- Replaced with text placeholders: "a filtered screenshot exists here"
- Recent actions and screenshots retained
- Episodic memory unaffected (persistent storage)This ensures:
- Context window stays within model limits
- Recent visual information is preserved
- Historical key data persists via episodic memory, not active context
Deployment
Core AWS Dependencies
| Service | Purpose |
|---|---|
| Amazon Bedrock | Hosted Claude model |
| Bedrock AgentCore | Agent orchestration + episodic memory |
| AWS Lambda / ECS | Playwright execution environment |
| Amazon S3 | Screenshot and log storage |
| IAM | Permission management |
Simplified Deployment
# 1. Clone project
git clone <repository-url>
# 2. Configure AWS credentials
aws configure
# 3. Deploy AgentCore
python deploy_agentcore.py \
--memory-enabled \
--namespace "airline-checkin"
# 4. Enable model access in Bedrock consolePractical Scenario: Airline Check-In
System: Complete airline check-in for passenger
Input: Confirmation code ABC123, Last name SMITHFirst attempt (no memory):
1. Open airline check-in page
2. aria_snapshot() for page structure
3. Semantic search: "confirmation code input" → found
4. Fill ABC123
5. Semantic search: "last name input" → found
6. Fill SMITH
7. Semantic search: "check in button" → found and clicked
8. CAPTCHA detected, needs human intervention
9. Memory stored: this airline's check-in has CAPTCHASecond attempt (with memory):
1. Open same airline check-in page
2. AgentCore retrieves relevant memory
3. Memory injected: "This page requires CAPTCHA verification"
4. Agent prepares CAPTCHA handling strategy
5. Fill info → skip known steps
6. Efficiency: first time 35s, second time 12sEnterprise vs Open-Source Comparison
| Dimension | AWS Agentic Form Filling | Nanobrowser / Browy | agent-browser |
|---|---|---|---|
| Setup complexity | High (multi-AWS service) | Very low (extension) | Low (CLI) |
| Memory management | AgentCore Episodic Memory | None | session save/load |
| Element targeting | Semantic vector + A11y | DOM path | A11y Ref |
| Context optimization | ImageFiltering sliding window | None | None (A11y inherently compact) |
| Scaling | AWS infrastructure auto-scaling | Single user | Single instance |
| Cost | AWS fees + model calls | Own API key | Own API key |
| Best for | Enterprise production | Individual devs | Devs / small teams |
Limitations
- AWS lock-in: Deeply tied to Amazon Bedrock and AWS ecosystem
- Complex setup: Requires understanding and configuring multiple AWS services
- Higher latency: AgentCore orchestration + screenshot analysis + vector search adds overhead
- Cost uncertainty: Enterprise managed services + Claude model costs need careful estimation
Summary
AWS Agentic Form Filling represents an enterprise-grade paradigm for AI browser agents — it doesn't pursue "fastest" or "cheapest," but "most reliable" and "learnable."
Episodic memory transforms the agent from a stateless tool starting from scratch into a "digital employee" that accumulates experience. Semantic element discovery eliminates dependency on fragile CSS selectors. The sliding window context manager ensures long-running complex tasks won't be interrupted by token limits.
For scenarios requiring processing large volumes of repeated yet subtly different enterprise forms (airline check-in, bank account opening, insurance claims), this architecture provides the most reliable engineering solution currently available.
The next article presents a comprehensive cross-framework comparison of mainstream AI browser agent tools as of mid-2026.
Need an enterprise proxy plan?
We can tailor architecture to your target domains, concurrency, and reliability goals.