manushi4/Screenhand
Give AI eyes and hands on your desktop. Open-source MCP server for desktop automation — screenshots, UI control, browser automation, OCR. Works with Claude, Cursor, and any MCP client. macOS + Windows.
Platform-specific configuration:
{
"mcpServers": {
"Screenhand": {
"command": "npx",
"args": [
"-y",
"Screenhand"
]
}
}
}Add the config above to .claude/settings.json under the mcpServers key.
<div align="center">
Let AI control your desktop — click buttons, fill forms, automate workflows in ~50ms with zero extra AI calls.
An open-source MCP server for macOS and Windows. Works with Claude, Cursor, Codex CLI, and any MCP-compatible client.
[](LICENSE) [](https://www.npmjs.com/package/screenhand) [](https://github.com/manushi4/screenhand/actions/workflows/ci.yml) []() []()
Quick Start | What It Does | Example | All 111 Tools | Architecture | Website
</div>
---
<!-- TODO: Add demo GIF here — 15 sec showing Claude controlling a real app -->
AI assistants can write code but can't use your computer. Every click requires a screenshot → LLM interpretation → coordinate guess — 3-5 seconds and an API call per action.
ScreenHand gives AI direct access to native OS APIs. No screenshots needed for clicks. No AI calls for button presses.
| | Without ScreenHand | With ScreenHand | |---|---|---| | Click a button | Screenshot → LLM → coordinate click (~3-5s) | Native Accessibility API (~50ms) | | Cost per action | 1 LLM API call | 0 LLM calls | | Accuracy | Coordinate guessing — misses on layout shift | Exact element targeting by role/name | | Browser control | Needs focus, screenshot per action | CDP in background (~10ms), no focus needed | | Works across apps | One app at a time | Cross-app workflows, multi-agent coordination |
<details open> <summ
Loading reviews...