Webseb155Free

browser-automation

Browser automation for E2E testing, visual QA, web interaction, and data extraction. This skill should be used when the user asks to 'open a website', 'test this page', 'take a screenshot', 'fill out a form', 'click a button', 'scrape data', 'automate browser actions', 'run visual QA', 'E2E test', or any task requiring programmatic web interaction. Supports both agent-browser CLI and Claude-in-Chrome MCP.

Repo bundle on Versuzseb155/atlas-plugin336 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/seb155/atlas-plugin Yours? Claim it ↗

§ 01 — Stats

Prior1090

Quality—

Score—

Tasks—

§ 02 — Install

Get browser-automation.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install seb155-atlas-plugin-dist-atlas-dev-addon-skills-browser-automation

Or clone the repo

$git clone https://github.com/seb155/atlas-plugin.git

Or copy the SKILL.md manually

$cp atlas-plugin/SKILL.MD ~/.claude/skills/seb155-atlas-plugin-dist-atlas-dev-addon-skills-browser-automation/SKILL.md

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge seb155-atlas-plugin-dist-atlas-dev-addon-skills-browser-automation↵

Show SKILL.md content (~2.4k tokens)

---
name: browser-automation
description: "Browser automation for E2E testing, visual QA, web interaction, and data extraction. This skill should be used when the user asks to 'open a website', 'test this page', 'take a screenshot', 'fill out a form', 'click a button', 'scrape data', 'automate browser actions', 'run visual QA', 'E2E test', or any task requiring programmatic web interaction. Supports both agent-browser CLI and Claude-in-Chrome MCP."
mode: [personal, all]
effort: low
---

# Browser Automation

Automate browser interactions for E2E testing, visual QA, form filling, data extraction, and web scraping.

## Two Backends

| Backend | When to Use | Tools |
|---------|-------------|-------|
| **agent-browser CLI** | Headless automation, CI, scripted flows | `Bash(agent-browser:*)` |
| **Claude-in-Chrome MCP** | Interactive browser, visual debugging, live pages | `mcp__claude-in-chrome__*` |

Detect which is available. Prefer agent-browser for automated flows, Chrome MCP for interactive/visual work.

## Core Workflow (agent-browser)

Every automation follows: **Navigate → Snapshot → Interact → Re-snapshot**

```bash
agent-browser open https://example.com
agent-browser snapshot -i              # Get element refs (@e1, @e2...)
agent-browser fill @e1 "value"         # Interact using refs
agent-browser click @e2
agent-browser wait --load networkidle
agent-browser snapshot -i              # Fresh refs after page change
```

### Essential Commands

```bash
# Navigation
agent-browser open <url>
agent-browser close

# Snapshot (ALWAYS before interacting)
agent-browser snapshot -i              # Interactive elements with refs
agent-browser snapshot -i -C           # Include cursor-interactive elements

# Interaction (use @refs from snapshot)
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser select @e1 "option"
agent-browser check @e1
agent-browser press Enter
agent-browser scroll down 500

# Information
agent-browser get text @e1
agent-browser get url
agent-browser get title

# Wait
agent-browser wait @e1                 # Wait for element
agent-browser wait --load networkidle  # Wait for network idle
agent-browser wait 2000                # Wait milliseconds

# Capture
agent-browser screenshot
agent-browser screenshot --full
agent-browser pdf output.pdf
```

### Ref Lifecycle (CRITICAL)
Refs (`@e1`, `@e2`) are **invalidated** when the page changes. ALWAYS re-snapshot after:
- Clicking links/buttons that navigate
- Form submissions
- Dynamic content loading (dropdowns, modals)

## Core Workflow (Chrome MCP)

```
1. tabs_context_mcp → get tab IDs
2. navigate → URL
3. read_page / find → get element refs
4. computer / form_input → interact
5. screenshot → verify
```

## Common Patterns

### Form Submission
```bash
agent-browser open <url>
agent-browser snapshot -i
agent-browser fill @e1 "name"
agent-browser fill @e2 "email"
agent-browser click @e3              # Submit
agent-browser wait --load networkidle
agent-browser screenshot             # Verify result
```

### Authentication + State Persistence
```bash
agent-browser open <login-url>
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json   # Reuse later
```

### Data Extraction
```bash
agent-browser open <url>
agent-browser snapshot -i
agent-browser get text @e5
agent-browser get text body > page.txt
```

### Visual QA (Screenshot Comparison)
```bash
agent-browser open <url>
agent-browser screenshot before.png
# ... make changes ...
agent-browser screenshot after.png
# Compare visually or with diff tool
```

## HITL Gates

- Before filling forms with sensitive data → confirm via AskUserQuestion
- Before submitting/posting anything → confirm
- Before login flows → confirm credentials usage
- After screenshot → present for visual approval

## Desktop Use (W5.2 stretch — design)

> **Status**: SCAFFOLDING — design + CLI surface only. Full implementation deferred to **v7.1**
> (requires Anthropic Computer Use beta API access + sandboxed runtime). Do not invoke the
> `atlas desktop` commands below in v7.0 — they intentionally exit with `not-yet-implemented`.

### Why a Desktop Backend (in addition to browser MCP)

Browser MCP (Chrome MCP, Playwright, agent-browser) covers ~95% of Synapse + DevHub QA flows.
The remaining ~5% requires **OS-level pixel-and-keyboard control**:

| Use case | Browser MCP | Desktop Use |
|---|---|---|
| Click button on `synapse.axoiq.com` | Yes (DOM ref) | Overkill |
| File-picker native dialog (PDF import wizard) | No (OS modal) | **Yes** (pixel click) |
| Screenshot validate Konva canvas pixels | Partial (DOM screenshot) | **Yes** (full-screen capture) |
| Drive a desktop app (e.g., Bluebeam, AutoCAD) | No | **Yes** |
| Test PWA install banner / OS notification | No | **Yes** |
| Cross-app paste (clipboard from Excel → web form) | No | **Yes** |

**Reference patterns**:
- **Cognition Devin 2.2** — desktop-use mode for autonomous SWE agents (Q1 2026 release)
- **Anthropic Computer Use** — `computer_20250124` tool family (beta), screenshot + click + type primitives

### Proposed CLI Surface (`atlas desktop ...`)

```bash
# Capture
atlas desktop screenshot [--region X,Y,W,H] [-o out.png]
atlas desktop screen info                    # resolution, scale factor

# Pointer
atlas desktop click <X,Y> [--button left|right|middle] [--double]
atlas desktop hover <X,Y>
atlas desktop drag <X1,Y1> <X2,Y2>
atlas desktop scroll <X,Y> --delta <N>

# Keyboard
atlas desktop type "<text>"
atlas desktop key <combo>                    # e.g. "cmd+a", "Return", "Escape"

# Higher-level helpers (W5.2 v7.1+)
atlas desktop find "Submit button"           # vision-locate via screenshot + LLM
atlas desktop wait-for "Loading complete"    # poll-screenshot until text appears
```

### Architecture (v7.1 target)

```
atlas desktop CLI
   |
   +-- thin wrapper around Anthropic Computer Use beta tool
   |     (claude-3-5-sonnet-20241022 + computer_20250124)
   |
   +-- screenshot capture: scrot / screencapture / Windows.Graphics.Capture
   +-- input injection:    xdotool / cliclick / SendInput
   +-- guardrails:         allow-list of windows, redact PII regions, HITL on irreversible
```

### Synergy with Browser MCP

Desktop use **complements** (does not replace) browser tooling:

1. **Default to browser MCP** when target is a web page — cheaper, deterministic, ref-based
2. **Escalate to desktop** only when crossing the OS boundary (file dialog, native app, cross-app)
3. **Hybrid flows OK**: Chrome MCP drives the page, desktop use handles the file picker that
   pops up when clicking `<input type="file">`

### Trade-offs (why this is stretch, not v7.0)

| Concern | Browser MCP | Desktop Use |
|---|---|---|
| Determinism | High (DOM refs) | Medium (pixel coords drift across resolutions) |
| Cost per action | Free | LLM call per screenshot+decide |
| Latency | <100ms | 1-3s (screenshot + vision) |
| Sandboxing | Tab-isolated | Full OS access — needs explicit allow-list |
| Cross-platform | Same everywhere | Per-OS adapter (xdotool vs cliclick vs Windows) |
| HITL surface | Low (per-form) | **High** — desktop use requires confirm-before-irreversible **per action class** |

### Synapse-specific use cases (v7.1 backlog)

- Validate **Konva interactive diagram** rendering pixel-accurate (canvas pixels not in DOM)
- Drive **Bluebeam Revu** for I&C drawing markup automation (Charles workflow)
- Test **OS clipboard handoff** from Excel templates → Synapse import wizard
- Capture **PWA install** flow on Chrome / Edge / Safari
- Drive **Authentik device-flow** code prompt across multiple browsers / OS dialogs

### HITL Gates (mandatory for desktop use)

Desktop use bypasses the browser sandbox, so HITL is **stricter** than browser MCP:

- **ANY click outside an allow-listed window** → AskUserQuestion before action
- **ANY keyboard input that could trigger a system shortcut** (cmd+Q, alt+F4, etc.) → confirm
- **ANY drag operation crossing window boundaries** → confirm
- **Screenshot containing potential PII** (email client, password manager visible) → blur + confirm before persisting
- **First-use of `atlas desktop` per session** → display capability disclosure + ask for opt-in

### Out-of-scope for v7.0 (explicit non-goals)

- Recording macros / replay (defer to v7.2)
- OCR-based element finding (use vision LLM in v7.1, OCR in v7.2 if cost-prohibitive)
- Multi-monitor coordinate handling (single primary display only in v7.1)
- Headless desktop (X virtual framebuffer) — v7.2 for CI integration

### Verification placeholder (v7.1)

When implemented, every `atlas desktop` command will:
1. Take a screenshot **before** the action
2. Execute the action
3. Take a screenshot **after**
4. Diff the two and log to `~/.atlas/desktop-trace/<session-id>/`
5. On HITL-gated actions, present both screenshots in the AskUserQuestion preview