Nanobrowser Source Code: Clickable Element Detection and Hash Deduplication
Inside Nanobrowser's DOM traversal: getClickableElements finds interactive nodes, hashDomElement creates unique fingerprints via three-layer hashing, and an iterative stack prevents call-stack overflow.
Introduction: How an Agent Knows What to Click
Reading page content is only the first step. The more critical ability is: knowing which elements on the page are interactive, and precisely targeting them.
Nanobrowser solves this in three steps:
- DOM tree construction — convert raw HTML into a traversable DOMElementNode tree
- Clickable element detection — walk the tree to find interactive elements
- Element hash deduplication — create unique fingerprints for stable identification
Step 1: DOM Tree Construction
browser/dom/service.ts:133-280
async function _buildDomTree(tabId, url, ...) {
// about:blank or chrome:// → minimal DOM tree
if (isNewTabPage(url) || url.startsWith('chrome://')) {
return [minimalTree, new Map()];
}
await injectBuildDomTreeScripts(tabId);
// Executes buildDomTree in page context
const result = await chrome.scripting.executeScript({
target: { tabId },
func: args => window.buildDomTree(args),
args: [{ showHighlightElements, ... }],
});
// Two-phase construction: create nodes, then build tree
}Key details:
-
Script injection check —
injectBuildDomTreeScriptschecks ifbuildDomTreeis already loaded per-frame, avoiding redundant injection (service.ts:340-380). -
Multi-frame assembly —
constructFrameTreehandles iframes by matching height, width, src, and name. Failed iframes (cross-origin) are marked without blocking construction (service.ts:175-240). -
Two-phase construction — first create all nodes without parent references, then build the tree structure. Avoids circular references (
service.ts:285-320).
Step 2: Clickable Element Detection
browser/dom/clickable/service.ts:17-48
export function getClickableElements(
domElement: DOMElementNode
): DOMElementNode[] {
const clickableElements: DOMElementNode[] = [];
const stack: DOMElementNode[] = [];
// Push root children in reverse order
for (let i = domElement.children.length - 1; i >= 0; i--) {
const child = domElement.children[i];
if (child instanceof DOMElementNode) {
stack.push(child);
}
}
while (stack.length > 0) {
const node = stack.pop() as DOMElementNode;
if (node.highlightIndex !== null) {
clickableElements.push(node);
}
// Push children in reverse (maintains document order)
for (let i = node.children.length - 1; i >= 0; i--) {
const child = node.children[i];
if (child instanceof DOMElementNode) {
stack.push(child);
}
}
}
return clickableElements;
}Why Iterative Stack Instead of Recursion
The comment says it: to avoid "Maximum call stack size exceeded" errors on deep DOMs.
Some pages have DOM trees thousands of levels deep. Recursive traversal blows the call stack. An explicit array-based stack uses heap memory instead, which is only limited by available RAM.
Both approaches are O(n), but the iterative stack offers far better space stability.
Where highlightIndex Comes From
buildDomTree.js (injected into the page) assigns highlightIndex to elements that are:
- Visible (
offsetParent !== null, dimensions > 0) - Interactive (
button,a,input,select,textarea,[role="button"], etc.) - In viewport (with configurable expansion)
Matching elements get incremental indices and visible overlay numbers on the page. This is the numbering you see in Nanobrowser's side panel.
Step 3: Element Hash Deduplication
browser/dom/clickable/service.ts:50-80
export async function hashDomElement(
domElement: DOMElementNode
): Promise<string> {
const parentBranchPath = _getParentBranchPath(domElement);
const [branchPathHash, attributesHash, xpathHash] = await Promise.all([
_parentBranchPathHash(parentBranchPath),
_attributesHash(domElement.attributes),
_xpathHash(domElement.xpath),
]);
return _hashString(
`${branchPathHash}-${attributesHash}-${xpathHash}`
);
}Three-Layer Hash
| Layer | What It Hashes | Purpose |
|---|---|---|
| Branch path | Sibling index of each ancestor from root to element | Locate position in tree |
| Attributes | tagName + key attributes from DEFAULT_INCLUDE_ATTRIBUTES | Identify element characteristics |
| XPath | Element's XPath expression | Pinpoint DOM node |
Attribute Selection
From views.ts:
export const DEFAULT_INCLUDE_ATTRIBUTES = [
'title', 'type', 'checked', 'name', 'role',
'value', 'placeholder', 'data-date-format', 'data-state',
'alt', 'aria-checked', 'aria-label', 'aria-expanded', 'href',
];Note: class and id are excluded — they change too frequently to serve as stable identifiers.
Branch Path vs CSS Selectors
CSS selector: .product-card .add-to-cart
Redesign → class changes → fails
Branch path hash: 3-1-5-2
Redesign → tree position unchanged → still validFor scrapers: position-based targeting is more robust than class/ID-based targeting when the element's structural position is stable.
From Detection to LLM Action
The pipeline produces a DOMState:
export interface DOMState {
elementTree: DOMElementNode;
selectorMap: Map<number, DOMElementNode>;
}The Navigator agent formats this for the LLM:
[33]<div>User form</div>
\t[35]<button aria-label='Submit form'>Submit</button>Elements marked with * are new since the last step. Indentation (\t) shows parent-child relationships. This format is defined in prompts/templates/navigator.ts.
Lessons for Scraper Developers
Clickable element detection isn't just for browser agents. The same mechanism can power scrapers:
# Find "Next page" using the same visibility + interactivity logic
elements = await page.evaluate("""
() => {
const buttons = document.querySelectorAll('a, button, [role="button"]');
return Array.from(buttons)
.filter(el => el.offsetParent !== null)
.filter(el => el.textContent.includes('Next'))
.map(el => ({ text: el.textContent.trim(), href: el.href }));
}
""")Summary
Three components power Nanobrowser's clickable element system:
- buildDomTree — injected JavaScript that constructs the interactive DOM tree
- getClickableElements — iterative stack traversal collects all interactive nodes
- hashDomElement — three-layer hashing creates unique fingerprints for deduplication
This article showed how Nanobrowser transforms a "page" into an "element list" — the foundation for all downstream agent decisions.
The next article analyzes the multi-agent orchestration loop — how the Executor drives Planner and Navigator through multi-step page scraping tasks.
Need an enterprise proxy plan?
We can tailor architecture to your target domains, concurrency, and reliability goals.