Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install seb155-atlas-plugin-dist-atlas-dev-addon-skills-browser-automationgit clone https://github.com/seb155/atlas-plugin.gitcp atlas-plugin/SKILL.MD ~/.claude/skills/seb155-atlas-plugin-dist-atlas-dev-addon-skills-browser-automation/SKILL.md--- name: browser-automation description: "Browser automation for E2E testing, visual QA, web interaction, and data extraction. This skill should be used when the user asks to 'open a website', 'test this page', 'take a screenshot', 'fill out a form', 'click a button', 'scrape data', 'automate browser actions', 'run visual QA', 'E2E test', or any task requiring programmatic web interaction. Supports both agent-browser CLI and Claude-in-Chrome MCP." mode: [personal, all] effort: low --- # Browser Automation Automate browser interactions for E2E testing, visual QA, form filling, data extraction, and web scraping. ## Two Backends | Backend | When to Use | Tools | |---------|-------------|-------| | **agent-browser CLI** | Headless automation, CI, scripted flows | `Bash(agent-browser:*)` | | **Claude-in-Chrome MCP** | Interactive browser, visual debugging, live pages | `mcp__claude-in-chrome__*` | Detect which is available. Prefer agent-browser for automated flows, Chrome MCP for interactive/visual work. ## Core Workflow (agent-browser) Every automation follows: **Navigate → Snapshot → Interact → Re-snapshot** ```bash agent-browser open https://example.com agent-browser snapshot -i # Get element refs (@e1, @e2...) agent-browser fill @e1 "value" # Interact using refs agent-browser click @e2 agent-browser wait --load networkidle agent-browser snapshot -i # Fresh refs after page change ``` ### Essential Commands ```bash # Navigation agent-browser open <url> agent-browser close # Snapshot (ALWAYS before interacting) agent-browser snapshot -i # Interactive elements with refs agent-browser snapshot -i -C # Include cursor-interactive elements # Interaction (use @refs from snapshot) agent-browser click @e1 agent-browser fill @e2 "text" agent-browser select @e1 "option" agent-browser check @e1 agent-browser press Enter agent-browser scroll down 500 # Information agent-browser get text @e1 agent-browser get url agent-browser get title # Wait agent-browser wait @e1 # Wait for element agent-browser wait --load networkidle # Wait for network idle agent-browser wait 2000 # Wait milliseconds # Capture agent-browser screenshot agent-browser screenshot --full agent-browser pdf output.pdf ``` ### Ref Lifecycle (CRITICAL) Refs (`@e1`, `@e2`) are **invalidated** when the page changes. ALWAYS re-snapshot after: - Clicking links/buttons that navigate - Form submissions - Dynamic content loading (dropdowns, modals) ## Core Workflow (Chrome MCP) ``` 1. tabs_context_mcp → get tab IDs 2. navigate → URL 3. read_page / find → get element refs 4. computer / form_input → interact 5. screenshot → verify ``` ## Common Patterns ### Form Submission ```bash agent-browser open <url> agent-browser snapshot -i agent-browser fill @e1 "name" agent-browser fill @e2 "email" agent-browser click @e3 # Submit agent-browser wait --load networkidle agent-browser screenshot # Verify result ``` ### Authentication + State Persistence ```bash agent-browser open <login-url> agent-browser snapshot -i agent-browser fill @e1 "$USERNAME" agent-browser fill @e2 "$PASSWORD" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json # Reuse later ``` ### Data Extraction ```bash agent-browser open <url> agent-browser snapshot -i agent-browser get text @e5 agent-browser get text body > page.txt ``` ### Visual QA (Screenshot Comparison) ```bash agent-browser open <url> agent-browser screenshot before.png # ... make changes ... agent-browser screenshot after.png # Compare visually or with diff tool ``` ## HITL Gates - Before filling forms with sensitive data → confirm via AskUserQuestion - Before submitting/posting anything → confirm - Before login flows → confirm credentials usage - After screenshot → present for visual approval ## Desktop Use (W5.2 stretch — design) > **Status**: SCAFFOLDING — design + CLI surface only. Full implementation deferred to **v7.1** > (requires Anthropic Computer Use beta API access + sandboxed runtime). Do not invoke the > `atlas desktop` commands below in v7.0 — they intentionally exit with `not-yet-implemented`. ### Why a Desktop Backend (in addition to browser MCP) Browser MCP (Chrome MCP, Playwright, agent-browser) covers ~95% of Synapse + DevHub QA flows. The remaining ~5% requires **OS-level pixel-and-keyboard control**: | Use case | Browser MCP | Desktop Use | |---|---|---| | Click button on `synapse.axoiq.com` | Yes (DOM ref) | Overkill | | File-picker native dialog (PDF import wizard) | No (OS modal) | **Yes** (pixel click) | | Screenshot validate Konva canvas pixels | Partial (DOM screenshot) | **Yes** (full-screen capture) | | Drive a desktop app (e.g., Bluebeam, AutoCAD) | No | **Yes** | | Test PWA install banner / OS notification | No | **Yes** | | Cross-app paste (clipboard from Excel → web form) | No | **Yes** | **Reference patterns**: - **Cognition Devin 2.2** — desktop-use mode for autonomous SWE agents (Q1 2026 release) - **Anthropic Computer Use** — `computer_20250124` tool family (beta), screenshot + click + type primitives ### Proposed CLI Surface (`atlas desktop ...`) ```bash # Capture atlas desktop screenshot [--region X,Y,W,H] [-o out.png] atlas desktop screen info # resolution, scale factor # Pointer atlas desktop click <X,Y> [--button left|right|middle] [--double] atlas desktop hover <X,Y> atlas desktop drag <X1,Y1> <X2,Y2> atlas desktop scroll <X,Y> --delta <N> # Keyboard atlas desktop type "<text>" atlas desktop key <combo> # e.g. "cmd+a", "Return", "Escape" # Higher-level helpers (W5.2 v7.1+) atlas desktop find "Submit button" # vision-locate via screenshot + LLM atlas desktop wait-for "Loading complete" # poll-screenshot until text appears ``` ### Architecture (v7.1 target) ``` atlas desktop CLI | +-- thin wrapper around Anthropic Computer Use beta tool | (claude-3-5-sonnet-20241022 + computer_20250124) | +-- screenshot capture: scrot / screencapture / Windows.Graphics.Capture +-- input injection: xdotool / cliclick / SendInput +-- guardrails: allow-list of windows, redact PII regions, HITL on irreversible ``` ### Synergy with Browser MCP Desktop use **complements** (does not replace) browser tooling: 1. **Default to browser MCP** when target is a web page — cheaper, deterministic, ref-based 2. **Escalate to desktop** only when crossing the OS boundary (file dialog, native app, cross-app) 3. **Hybrid flows OK**: Chrome MCP drives the page, desktop use handles the file picker that pops up when clicking `<input type="file">` ### Trade-offs (why this is stretch, not v7.0) | Concern | Browser MCP | Desktop Use | |---|---|---| | Determinism | High (DOM refs) | Medium (pixel coords drift across resolutions) | | Cost per action | Free | LLM call per screenshot+decide | | Latency | <100ms | 1-3s (screenshot + vision) | | Sandboxing | Tab-isolated | Full OS access — needs explicit allow-list | | Cross-platform | Same everywhere | Per-OS adapter (xdotool vs cliclick vs Windows) | | HITL surface | Low (per-form) | **High** — desktop use requires confirm-before-irreversible **per action class** | ### Synapse-specific use cases (v7.1 backlog) - Validate **Konva interactive diagram** rendering pixel-accurate (canvas pixels not in DOM) - Drive **Bluebeam Revu** for I&C drawing markup automation (Charles workflow) - Test **OS clipboard handoff** from Excel templates → Synapse import wizard - Capture **PWA install** flow on Chrome / Edge / Safari - Drive **Authentik device-flow** code prompt across multiple browsers / OS dialogs ### HITL Gates (mandatory for desktop use) Desktop use bypasses the browser sandbox, so HITL is **stricter** than browser MCP: - **ANY click outside an allow-listed window** → AskUserQuestion before action - **ANY keyboard input that could trigger a system shortcut** (cmd+Q, alt+F4, etc.) → confirm - **ANY drag operation crossing window boundaries** → confirm - **Screenshot containing potential PII** (email client, password manager visible) → blur + confirm before persisting - **First-use of `atlas desktop` per session** → display capability disclosure + ask for opt-in ### Out-of-scope for v7.0 (explicit non-goals) - Recording macros / replay (defer to v7.2) - OCR-based element finding (use vision LLM in v7.1, OCR in v7.2 if cost-prohibitive) - Multi-monitor coordinate handling (single primary display only in v7.1) - Headless desktop (X virtual framebuffer) — v7.2 for CI integration ### Verification placeholder (v7.1) When implemented, every `atlas desktop` command will: 1. Take a screenshot **before** the action 2. Execute the action 3. Take a screenshot **after** 4. Diff the two and log to `~/.atlas/desktop-trace/<session-id>/` 5. On HITL-gated actions, present both screenshots in the AskUserQuestion preview