VibeCodingNights/agent-harnesses
The pattern isn't about code — it's about closing loops
Platform-specific configuration:
{
"mcpServers": {
"agent-harnesses": {
"command": "npx",
"args": [
"-y",
"agent-harnesses"
]
}
}
}Add the config above to .claude/settings.json under the mcpServers key.
Coding agents are the status quo. Stripe ships 1,300 PRs a week with zero hand-typed code. OpenAI built a million-line codebase with three engineers and an agent harness. Claude Code scored 78% on CORE-Bench where the same model in a different scaffold scored 42%. The question of whether agents can write and run code is settled.
The question that isn't settled: can they do anything else?
Almost every harness that exists today is for software engineering — and for good reason. Code is uniquely verifiable. You write a function, run the tests, get a binary signal: it works or it doesn't. That tight feedback loop is what makes harness engineering possible. The agent acts, the environment responds with ground truth, and the harness accumulates the difference as institutional knowledge.
But the domains that would benefit most from agency are the ones where that loop doesn't close cleanly. A self-driving lab at Berkeley ran for 17 continuous days synthesizing new materials — but 42% of experiments in Sakana's AI Scientist failed due to errors the agent couldn't catch. PagerDuty's incident response agents cut resolution times in half — but Gartner predicts 40% of agentic AI projects will be canceled by 2027 because the feedback loops outside code are slower, noisier, and more expensive to close. Harvey AI reached $100M ARR writing legal briefs — but no one has built the equivalent of pytest for a contract clause.
The harness pattern works. The open problem is verification in domains where there is no compiler.
Tonight you learn the pattern by building it for code — then you push it somewhere the loop hasn't been closed yet.
---
In February 2026, Mitchell Hashimoto named the practice: every time an agent fails, you engineer the environment so it can never fail that way again. Days later, OpenAI [published the playbook](https://openai.com/index/harness-engin
Loading reviews...