Nanobrowser Source Code: Clickable Element Detection and Hash Deduplication

Inside Nanobrowser's DOM traversal: getClickableElements finds interactive nodes, hashDomElement creates unique fingerprints via three-layer hashing, and an iterative stack prevents call-stack overflow.

16Yun Engineering TeamApr 10, 20263 min read

Introduction: How an Agent Knows What to Click

Reading page content is only the first step. The more critical ability is: knowing which elements on the page are interactive, and precisely targeting them.

Nanobrowser solves this in three steps:

  1. DOM tree construction — convert raw HTML into a traversable DOMElementNode tree
  2. Clickable element detection — walk the tree to find interactive elements
  3. Element hash deduplication — create unique fingerprints for stable identification

Step 1: DOM Tree Construction

browser/dom/service.ts:133-280

async function _buildDomTree(tabId, url, ...) {
  // about:blank or chrome:// → minimal DOM tree
  if (isNewTabPage(url) || url.startsWith('chrome://')) {
    return [minimalTree, new Map()];
  }
 
  await injectBuildDomTreeScripts(tabId);
  // Executes buildDomTree in page context
  const result = await chrome.scripting.executeScript({
    target: { tabId },
    func: args => window.buildDomTree(args),
    args: [{ showHighlightElements, ... }],
  });
  // Two-phase construction: create nodes, then build tree
}

Key details:

  1. Script injection checkinjectBuildDomTreeScripts checks if buildDomTree is already loaded per-frame, avoiding redundant injection (service.ts:340-380).

  2. Multi-frame assemblyconstructFrameTree handles iframes by matching height, width, src, and name. Failed iframes (cross-origin) are marked without blocking construction (service.ts:175-240).

  3. Two-phase construction — first create all nodes without parent references, then build the tree structure. Avoids circular references (service.ts:285-320).

Step 2: Clickable Element Detection

browser/dom/clickable/service.ts:17-48

export function getClickableElements(
  domElement: DOMElementNode
): DOMElementNode[] {
  const clickableElements: DOMElementNode[] = [];
  const stack: DOMElementNode[] = [];
 
  // Push root children in reverse order
  for (let i = domElement.children.length - 1; i >= 0; i--) {
    const child = domElement.children[i];
    if (child instanceof DOMElementNode) {
      stack.push(child);
    }
  }
 
  while (stack.length > 0) {
    const node = stack.pop() as DOMElementNode;
    if (node.highlightIndex !== null) {
      clickableElements.push(node);
    }
    // Push children in reverse (maintains document order)
    for (let i = node.children.length - 1; i >= 0; i--) {
      const child = node.children[i];
      if (child instanceof DOMElementNode) {
        stack.push(child);
      }
    }
  }
 
  return clickableElements;
}

Why Iterative Stack Instead of Recursion

The comment says it: to avoid "Maximum call stack size exceeded" errors on deep DOMs.

Some pages have DOM trees thousands of levels deep. Recursive traversal blows the call stack. An explicit array-based stack uses heap memory instead, which is only limited by available RAM.

Both approaches are O(n), but the iterative stack offers far better space stability.

Where highlightIndex Comes From

buildDomTree.js (injected into the page) assigns highlightIndex to elements that are:

  1. Visible (offsetParent !== null, dimensions > 0)
  2. Interactive (button, a, input, select, textarea, [role="button"], etc.)
  3. In viewport (with configurable expansion)

Matching elements get incremental indices and visible overlay numbers on the page. This is the numbering you see in Nanobrowser's side panel.

Step 3: Element Hash Deduplication

browser/dom/clickable/service.ts:50-80

export async function hashDomElement(
  domElement: DOMElementNode
): Promise<string> {
  const parentBranchPath = _getParentBranchPath(domElement);
 
  const [branchPathHash, attributesHash, xpathHash] = await Promise.all([
    _parentBranchPathHash(parentBranchPath),
    _attributesHash(domElement.attributes),
    _xpathHash(domElement.xpath),
  ]);
 
  return _hashString(
    `${branchPathHash}-${attributesHash}-${xpathHash}`
  );
}

Three-Layer Hash

LayerWhat It HashesPurpose
Branch pathSibling index of each ancestor from root to elementLocate position in tree
AttributestagName + key attributes from DEFAULT_INCLUDE_ATTRIBUTESIdentify element characteristics
XPathElement's XPath expressionPinpoint DOM node

Attribute Selection

From views.ts:

export const DEFAULT_INCLUDE_ATTRIBUTES = [
  'title', 'type', 'checked', 'name', 'role',
  'value', 'placeholder', 'data-date-format', 'data-state',
  'alt', 'aria-checked', 'aria-label', 'aria-expanded', 'href',
];

Note: class and id are excluded — they change too frequently to serve as stable identifiers.

Branch Path vs CSS Selectors

CSS selector: .product-card .add-to-cart
  Redesign → class changes → fails
 
Branch path hash: 3-1-5-2
  Redesign → tree position unchanged → still valid

For scrapers: position-based targeting is more robust than class/ID-based targeting when the element's structural position is stable.

From Detection to LLM Action

The pipeline produces a DOMState:

export interface DOMState {
  elementTree: DOMElementNode;
  selectorMap: Map<number, DOMElementNode>;
}

The Navigator agent formats this for the LLM:

[33]<div>User form</div>
\t[35]<button aria-label='Submit form'>Submit</button>

Elements marked with * are new since the last step. Indentation (\t) shows parent-child relationships. This format is defined in prompts/templates/navigator.ts.

Lessons for Scraper Developers

Clickable element detection isn't just for browser agents. The same mechanism can power scrapers:

# Find "Next page" using the same visibility + interactivity logic
elements = await page.evaluate("""
  () => {
    const buttons = document.querySelectorAll('a, button, [role="button"]');
    return Array.from(buttons)
      .filter(el => el.offsetParent !== null)
      .filter(el => el.textContent.includes('Next'))
      .map(el => ({ text: el.textContent.trim(), href: el.href }));
  }
""")

Summary

Three components power Nanobrowser's clickable element system:

  1. buildDomTree — injected JavaScript that constructs the interactive DOM tree
  2. getClickableElements — iterative stack traversal collects all interactive nodes
  3. hashDomElement — three-layer hashing creates unique fingerprints for deduplication

This article showed how Nanobrowser transforms a "page" into an "element list" — the foundation for all downstream agent decisions.

The next article analyzes the multi-agent orchestration loop — how the Executor drives Planner and Navigator through multi-step page scraping tasks.

Need an enterprise proxy plan?

We can tailor architecture to your target domains, concurrency, and reliability goals.