Voice-driven browser agent

Say it.
Watch it fall into place.

DOMino turns your voice into browser actions. Hold a key, speak a command, and an AI agent sees your screen, reasons about what to do, and does it, clicking, typing, navigating, and finishing whole tasks. One word topples the whole chain.

See how it works Hold Enter and speak
Why the name?

DOMino

DOM, the Document Object Model, the structure of every web page DOMino reads and operates on directly.

Domino, one spoken command topples a whole chain of actions that fall into place, exactly like a line of dominoes.

How it works

Three tiles. One chain reaction.

No clicking, no typing, no hunting through menus. You speak, DOMino handles the rest in a tight observe → reason → act loop.

1

Speak

Hold Enter and say what you want, “search YouTube and play some lofi,” “compose an email to Sam about the meeting.” Your voice is transcribed in real time.

2

It sees

DOMino captures your screen and reads the page, every button, field, and link as a numbered map. It knows exactly what’s actionable.

3

It acts

The AI picks the next move and executes it on the real page, re-checking after each step, until the whole task is done. Hands-free.

What it does

An agent for the whole web.

DOMino isn’t scripted for a few sites, it works on virtually any website, because it acts on the page itself.

🗣️

Voice to action

Hold a key, speak, release. Natural commands become real browser actions, no syntax to learn.

👁️

Sees your screen

Multimodal: a screenshot plus a structured map of every interactive element. It reads the page like you do.

⚙️

Autonomous loop

Re-observes after every action and decides the next step on its own, up to a full multi-step task.

🌐

Any website

Search, shop, fill forms, compose email, navigate between sites. If it’s on the web, DOMino can drive it.

🔊

Speaks back

Natural voice readback tells you what it’s doing and answers questions about the page out loud.

💬

Multi-turn

Ask follow-ups, correct it, or chain requests, DOMino keeps the context per tab.

Under the hood

Powered by best-in-class AI.

A multimodal brain for reasoning, plus specialized engines for voice and page understanding. The reasoning model is pluggable, swap it with one setting.

Google Gemini via Vertex AI, reasoning + vision ElevenLabs, voice in & out Firecrawl, page understanding Chrome MV3, runs on any site 🔒 Runs locally, your keys stay on your machine

Tip the first domino.

DOMino runs entirely on your own machine, a local backend plus the extension in your browser, so your keys and data never leave your computer. Install it, open any website, hold Enter, and speak.

Get DOMino on GitHub →