Use case · Updated 2026-06-02
Browser automation for devs
The browser tool family wraps a real Chromium lifecycle: open sessions, navigate, snapshot accessibility trees, click/type, wait for selectors, extract text, and serve local files — gated by AGENT_BROWSER and lazy activation unless AGENT_BROWSER_ALWAYS_ACTIVE is set.
Tools in the browser family
- browser_open / browser_close — session lifecycle (AGENT_BROWSER_MAX_SESSIONS caps concurrency).
- browser_navigate — URLs with timeout handling.
- browser_snapshot — accessibility-oriented page state for the model.
- browser_act — clicks, typing, selects driven from snapshot refs.
- browser_wait_for — selectors, network idle, or custom conditions.
- browser_extract — structured text from the DOM.
- browser_serve_file — expose local HTML for testing.
- captcha_solve — optional 2captcha/CapSolver when AGENT_CAPTCHA_KEY is set.
Stealth and bot walls
AGENT_BROWSER_STEALTH patches common automation fingerprints. For web_fetch (non-browser), AGENT_WEB_FETCH_403_RETRY retries with alternate user agents when sites return bot-wall 401/403. Choose browser vs fetch based on whether the page needs JavaScript execution.
Developer workflows
Verify a staging UI after API changes without writing a full Playwright test suite first. Capture repro steps for a bug that only appears logged in. Scrape documentation behind client-rendered SPAs. Pair with web_search when the task starts from “find official docs” and ends on “click through the admin console.”
Enable in your harness
AGENT_BROWSER=1
# Then in the agent loop: activate_tool_family("browser")FAQ
Common questions
Headed vs headless?
AGENT_BROWSER_HEADED=1 runs visible Chromium — useful when debugging selectors; default is headless.