Nanobrowser Source Code: Clickable Element Detection and Hashing

Introduction: How an Agent Knows What to Click

Reading page content is only the first step. The more critical ability is: knowing which elements on the page are interactive, and precisely targeting them.

Nanobrowser solves this in three steps:

DOM tree construction — convert raw HTML into a traversable DOMElementNode tree
Clickable element detection — walk the tree to find interactive elements
Element hash deduplication — create unique fingerprints for stable identification

Step 1: DOM Tree Construction

browser/dom/service.ts:133-280

async function _buildDomTree(tabId, url, ...) {
  // about:blank or chrome:// → minimal DOM tree
  if (isNewTabPage(url) || url.startsWith('chrome://')) {
    return [minimalTree, new Map()];
  }
 
  await injectBuildDomTreeScripts(tabId);
  // Executes buildDomTree in page context
  const result = await chrome.scripting.executeScript({
    target: { tabId },
    func: args => window.buildDomTree(args),
    args: [{ showHighlightElements, ... }],
  });
  // Two-phase construction: create nodes, then build tree
}

Key details:

Script injection check — injectBuildDomTreeScripts checks if buildDomTree is already loaded per-frame, avoiding redundant injection (service.ts:340-380).
Multi-frame assembly — constructFrameTree handles iframes by matching height, width, src, and name. Failed iframes (cross-origin) are marked without blocking construction (service.ts:175-240).
Two-phase construction — first create all nodes without parent references, then build the tree structure. Avoids circular references (service.ts:285-320).

Step 2: Clickable Element Detection

browser/dom/clickable/service.ts:17-48

export function getClickableElements(
  domElement: DOMElementNode
): DOMElementNode[] {
  const clickableElements: DOMElementNode[] = [];
  const stack: DOMElementNode[] = [];
 
  // Push root children in reverse order
  for (let i = domElement.children.length - 1; i >= 0; i--) {
    const child = domElement.children[i];
    if (child instanceof DOMElementNode) {
      stack.push(child);
    }
  }
 
  while (stack.length > 0) {
    const node = stack.pop() as DOMElementNode;
    if (node.highlightIndex !== null) {
      clickableElements.push(node);
    }
    // Push children in reverse (maintains document order)
    for (let i = node.children.length - 1; i >= 0; i--) {
      const child = node.children[i];
      if (child instanceof DOMElementNode) {
        stack.push(child);
      }
    }
  }
 
  return clickableElements;
}

Why Iterative Stack Instead of Recursion

The comment says it: to avoid "Maximum call stack size exceeded" errors on deep DOMs.

Some pages have DOM trees thousands of levels deep. Recursive traversal blows the call stack. An explicit array-based stack uses heap memory instead, which is only limited by available RAM.

Both approaches are O(n), but the iterative stack offers far better space stability.

Where highlightIndex Comes From

buildDomTree.js (injected into the page) assigns highlightIndex to elements that are:

Visible (offsetParent !== null, dimensions > 0)
Interactive (button, a, input, select, textarea, [role="button"], etc.)
In viewport (with configurable expansion)

Matching elements get incremental indices and visible overlay numbers on the page. This is the numbering you see in Nanobrowser's side panel.

Step 3: Element Hash Deduplication

browser/dom/clickable/service.ts:50-80

export async function hashDomElement(
  domElement: DOMElementNode
): Promise<string> {
  const parentBranchPath = _getParentBranchPath(domElement);
 
  const [branchPathHash, attributesHash, xpathHash] = await Promise.all([
    _parentBranchPathHash(parentBranchPath),
    _attributesHash(domElement.attributes),
    _xpathHash(domElement.xpath),
  ]);
 
  return _hashString(
    `${branchPathHash}-${attributesHash}-${xpathHash}`
  );
}

Three-Layer Hash

Layer	What It Hashes	Purpose
Branch path	Sibling index of each ancestor from root to element	Locate position in tree
Attributes	tagName + key attributes from DEFAULT_INCLUDE_ATTRIBUTES	Identify element characteristics
XPath	Element's XPath expression	Pinpoint DOM node

Attribute Selection

From views.ts:

export const DEFAULT_INCLUDE_ATTRIBUTES = [
  'title', 'type', 'checked', 'name', 'role',
  'value', 'placeholder', 'data-date-format', 'data-state',
  'alt', 'aria-checked', 'aria-label', 'aria-expanded', 'href',
];

Note: class and id are excluded — they change too frequently to serve as stable identifiers.

Branch Path vs CSS Selectors

CSS selector: .product-card .add-to-cart
  Redesign → class changes → fails
 
Branch path hash: 3-1-5-2
  Redesign → tree position unchanged → still valid

For scrapers: position-based targeting is more robust than class/ID-based targeting when the element's structural position is stable.

From Detection to LLM Action

The pipeline produces a DOMState:

export interface DOMState {
  elementTree: DOMElementNode;
  selectorMap: Map<number, DOMElementNode>;
}

The Navigator agent formats this for the LLM:

[33]<div>User form</div>
\t[35]<button aria-label='Submit form'>Submit</button>

Elements marked with * are new since the last step. Indentation (\t) shows parent-child relationships. This format is defined in prompts/templates/navigator.ts.

Lessons for Scraper Developers

Clickable element detection isn't just for browser agents. The same mechanism can power scrapers:

# Find "Next page" using the same visibility + interactivity logic
elements = await page.evaluate("""
  () => {
    const buttons = document.querySelectorAll('a, button, [role="button"]');
    return Array.from(buttons)
      .filter(el => el.offsetParent !== null)
      .filter(el => el.textContent.includes('Next'))
      .map(el => ({ text: el.textContent.trim(), href: el.href }));
  }
""")

Summary

Three components power Nanobrowser's clickable element system:

buildDomTree — injected JavaScript that constructs the interactive DOM tree
getClickableElements — iterative stack traversal collects all interactive nodes
hashDomElement — three-layer hashing creates unique fingerprints for deduplication

This article showed how Nanobrowser transforms a "page" into an "element list" — the foundation for all downstream agent decisions.

The next article analyzes the multi-agent orchestration loop — how the Executor drives Planner and Navigator through multi-step page scraping tasks.

Nanobrowser Source Code: Clickable Element Detection and Hash Deduplication