Blog

Teach Hermes to Drive Any Website — Zero Config

Survey a website once, save the tested browser harness, and let Hermes replay it forever through executors and cron.

Teach Hermes to Drive Any Website — Zero Config

Most “AI browser automation” is still expensive improvisation.

You send an agent to a website. It explores. It fails. It tries another tool. The click looks like it worked, but nothing happens. It asks you to start Chrome with debugging enabled even though it has root access. It gets stuck on a dropdown. It forgets what worked last time.

Then next week, you ask it to do the same task again — and it starts from scratch.

That’s dumb.

The better pattern is simple:

Survey the website once. Save what worked. Replay it forever.

That’s what this Hermes workshop teaches.

You give Hermes one prompt. It creates a browser-harness builder, tests which browser engine actually works for the site, maps the flow step by step, documents selectors and fallbacks, verifies the harness with dummy data, and saves the result as a reusable skill.

After that, any Hermes executor can run the harness mechanically.

Builder surveys.
Executor replays.
Cron schedules.
One prompt, full stack.

Any website becomes an API.


The core idea

Most consumer websites do not expose useful APIs.

Walmart checkout. DMV appointments. Facebook pages. Flight check-in. Insurance claims. Gym class booking. Restaurant reservations. Doctor scheduling. Bank transfer setup.

They all have the same shape:

  • finite steps
  • clear buttons and fields
  • predictable success criteria
  • annoying UI traps
  • enough state that a blind agent wastes time rediscovering everything

A human can learn these flows once.

Hermes should too.

The goal is not “agent, go browse this website and hope.”

The goal is:

Hermes surveys the website, writes down the exact working path, tests the path, and leaves behind a harness that future agents can replay without thinking.

That’s the shift.


Builder mode vs executor mode

The trick is separating two jobs that should never be mixed.

Builder mode

The builder is exploratory.

It uses a smart model. It clicks around. It tests failure modes. It inspects browser snapshots, AX labels, CSS selectors, console noise, dropdown behavior, reload behavior, and weird React bullshit.

It asks you questions when it needs domain input.

It uses dummy data only.

Its job is not to complete the real task.

Its job is to map the terrain.

Executor mode

The executor is boring.

It follows the harness.

It does not rediscover the website.
It does not improvise unless the harness says to.
It does not ask you questions unless it hits a marked decision point.
It records side effects.
It stops when assumptions break.

Builder mode is reasoning.

Executor mode is replay.

Trying to do both in one agent produces a nervous intern with a browser. Splitting them produces infrastructure.


The workshop

Workshop: Teach Hermes to Drive Any Website — Zero Config

You paste one prompt into Hermes.

Hermes then:

  • Creates a dedicated harness-builder profile.
  • Loads the harness-authoring procedure.
  • Tests which browser backend actually works on your target website.
  • Walks the target flow step by step.
  • Records selectors, AX labels, CSS fallbacks, vision anchors, waits, failure modes, and recovery paths.
  • Builds a reusable harness file.
  • Spawns a verifier subagent to replay the harness with dummy data.
  • Marks the harness tested only after verification passes.

After that, you can run the harness manually or schedule it on cron.

The website is no longer an unknown surface.

It becomes a repeatable interface.


Step zero: find the browser that actually works

This part matters more than people think.

Not every website cooperates with every browser tool.

Some sites work fine in Playwright Chromium.
Some silently reject synthetic clicks.
Some require real Chrome.
Some throw bot checks.
Some need a persistent browser profile.
Some only work when driven through the actual desktop.

The builder tests the target site across browser lanes and records the winner.

Examples:

  • Playwright Chromium: good default, but some React-heavy sites silently ignore clicks.
  • Real Chrome via CDP: often needed for Google/Facebook-style properties.
  • Camoufox / nodriver: useful for stealthier sites or JS challenge walls.
  • Computer use / cua-driver: fallback when browser automation fails and Hermes needs to drive the macOS desktop directly.

The point is not to worship one browser backend.

The point is to stop guessing.

If Facebook requires real Chrome, the harness should say:

browser_requirement: real-chrome

Then the executor never wastes time trying the wrong tool again.


Step one: survey the site

Hermes walks through the website with you.

It asks questions like:

  • “What dummy value should I type here?”
  • “This dropdown has these options — which one should I select?”
  • “I see a notification popup — should I dismiss it?”
  • “This step could submit something real — should I stop before the final click?”

For every step, it records:

  • URL
  • action
  • expected screen
  • AX role and label
  • CSS fallback selectors
  • vision anchor
  • wait condition
  • success condition
  • skip rules
  • decision points
  • failure modes
  • recovery paths

It also tests the ugly stuff:

  • empty form submission
  • wrong file type
  • expired login
  • browser back
  • dropdown misclick
  • missing field
  • slow load
  • modal interruption

This is where the harness gets valuable.

Not from the happy path.

From the traps.


Step two: verify the harness

An unverified harness is worse than no harness.

After the builder writes the harness, you tell Hermes:

Spawn a subagent to replay every step of this harness using dummy data.
Report every failure with step number and what went wrong.
Never submit real data or create real accounts.
Respond with PASS or FAIL: [list of failures].

If it fails, the builder patches the broken step.

Then it verifies again.

Only after replay passes does the harness get marked tested.

No “looks good.”

No vibes.

PASS or FAIL.


Step three: schedule it

Once verified, the harness can run on cron.

Example:

Schedule the Walmart harness to check out with my saved cart every Friday at 9 AM.
Use the harness-executor profile.
Ask me at decision points.

Hermes creates the scheduled job.

The executor follows the harness, checkpoints each step, and surfaces only real decisions.

You sleep.
Cron runs.
Hermes reports when it needs you.

That is the whole point.


The one prompt

Paste this into Hermes and customize the bracketed parts.

I want to build a browser harness so Hermes can automate [TARGET WEBSITE].

First, set me up:

1. Create a profile called harness-builder:
   - Run: hermes profile create harness-builder --clone
   - Then: hermes model --profile harness-builder [YOUR BEST MODEL]
   - Use the smartest model available. Surveying takes reasoning.

2. Write this system prompt to the harness-builder profile:

"You are surveying, not executing.

Never submit real data.
Never create real accounts.
Never place real orders.

Your job is to map every step of a website flow:
- record AX labels
- record CSS fallbacks
- record vision anchors
- identify browser requirements
- document waits and success criteria
- document failure modes
- document recovery paths

Use dummy data only:
- Test Page
- placeholder text
- fake phone numbers
- fake addresses
- solid-color PNGs
- non-sensitive sample files

Test at least one failure mode per step.

When you finish mapping, distill everything into a harness file in the format defined by the harness-authoring skill."

3. Load the harness-authoring skill.

4. Switch me to the harness-builder profile.

5. Start surveying this URL:

[URL OF PAGE TO MAP]

The goal is not to complete the real task.
The goal is to produce a verified browser harness that an executor profile can replay later.

Ask me only when you need domain input or when a step could cause a real-world side effect.

That’s it.

Hermes creates the builder, configures it, loads the procedure, and starts mapping.


What the harness looks like

A good harness is not “click here, then click there.”

It is a runbook for a website.

Example skeleton:

---
name: mysite-checkout
domain: mysite.com
last_mapped: 2026-06-01
expires: 2026-07-01
tested: false
browser_requirement: real-chrome
prerequisites:
  - Logged-in account on mysite.com
  - Cart contains at least one item
---
# MySite Checkout Harness

## Step 1: Navigate to Cart

- URL: https://mysite.com/cart
- Action: navigate
- Wait for: heading "Your Cart" visible
- AX role: heading
- AX label: "Your Cart"
- CSS fallbacks:
  - document.querySelector('h1')?.textContent.includes('Cart')
- Vision anchor:
  - Heading "Your Cart" top-left, item list below
- Pitfalls:
  - Empty cart → add items first
  - Not logged in → redirects to login
- Decision point: false
- Skip allowed: false

Every step should tell the executor:

  • what to do
  • what to wait for
  • how to know it worked
  • what can go wrong
  • when to ask the user
  • when to stop

That is what makes it replayable.


Selector priority

When the harness says “click this,” Hermes should try selectors in this order:

1. AX role + label
Usually the most stable. Works well with accessibility snapshots.

2. CSS fallback selectors
Prefer semantic attributes like aria-label, name, data-testid, and stable IDs. Avoid generated class soup when possible.

3. Vision anchor
Last resort. “The blue button in the top-right.” Slower, but useful when the DOM is hostile.

Coordinates are the bottom of the barrel.

Use them only when there is no better path.


Real lessons from the Facebook harness

We built an 11-step Facebook Page creation harness.

The useful discoveries were not obvious from the happy path.

1. Playwright Chromium silently failed

The click appeared to work.

No error.

No crash.

Nothing happened.

That is the worst kind of failure because it looks like operator error.

The fix was real Chrome.

The harness now records:

browser_requirement: real-chrome

2. Autocomplete dropdowns had hidden click targets

Clicking the outer option wrapper did nothing.

Clicking the inner generic div with cursor: pointer worked.

That exact target goes in the harness.

Future executors do not have to rediscover it.

3. Browser back destroyed the form

Clicking back reset everything and returned to the homepage.

That is now documented as a hard failure mode.

The executor should never use browser back in that flow.

4. Some console errors were harmless

Errors like:

Unrecognized feature: 'sync-xhr'

look scary but did not affect the flow.

The harness records them as known noise.

That prevents future agents from panicking over irrelevant logs.

This is the value of surveying.

You turn hours of frustration into one reusable artifact.


Good harness targets

Harnesses work best when the task has:

  • finite steps
  • low ambiguity
  • clear success criteria
  • repeatable UI
  • limited judgment
  • known decision points

Good targets:

  • Walmart checkout
  • DMV appointment booking
  • flight check-in
  • domain registration
  • Facebook page creation
  • Indeed job posting
  • insurance claim filing
  • restaurant reservations
  • gym class booking
  • doctor appointment scheduling
  • bank transfer scheduling

Bad targets:

  • customer support chat
  • “find me the best deal”
  • creative design work
  • market research
  • CAPTCHA-heavy sites
  • Ticketmaster-style anti-bot hell
  • Google account creation

The point is not to automate everything.

The point is to automate the flows that deserve to become infrastructure.


The limitation

CAPTCHA is a wall.

MFA is a decision point.

Payments require confirmation.

Destructive actions require confirmation.

Legal, medical, financial, and identity-sensitive workflows need hard stops.

A good harness does not pretend these don’t exist.

It marks them.

The executor should stop and ask when the next action actually matters.

That is not a weakness.

That is how you keep automation from becoming a loaded gun with a browser.


The operating principle

Every agent run should leave something behind.

If Hermes fixes a bug, it should leave a test.

If Hermes solves a workflow, it should leave a skill.

If Hermes explores a website, it should leave a harness.

That is how agent work compounds.

A one-time browser session is disposable.

A verified harness is an asset.

Survey once.
Document the traps.
Verify the replay.
Run it forever.

That is the bar.