SingggggYee/agent-config-arena
Neutral benchmark arena that tests CLAUDE.md, AGENTS.md, and agent configs against each other on real coding tasks.
Platform-specific configuration:
{
"mcpServers": {
"agent-config-arena": {
"command": "npx",
"args": [
"-y",
"agent-config-arena"
]
}
}
}Add the config above to .claude/settings.json under the mcpServers key.
> Everyone shares their CLAUDE.md. Nobody benchmarks them. Until now.
[](LICENSE) [](https://nodejs.org) [](CONTRIBUTING.md)
| Config | Pass Rate | Avg Tokens | Avg Time | Avg Cost | Score | |--------|-----------|------------|----------|----------|-------| | token-efficient | 88% | 208k | 73.7s | $0.28 | 44 | | workflow-heavy | 86% | 205k | 90.0s | $0.31 | 40 | | baseline (no config) | 88% | 201k | 112.9s | $0.33 | 36 |
> Tested on 8 real coding tasks (REST API, refactoring, bug fix, CLI tool, data pipeline, test coverage, TS migration, performance optimization). Full results in LEADERBOARD.md.
---
A neutral, open-source benchmark that tests different coding agent configurations on the same real coding tasks -- then publishes the results.
Not a model benchmark. SWE-bench tests models. We test *configs*. Not a tool collection. awesome-claude-code collects tools. We *evaluate* them. Not a single config. claude-token-efficient ships one config. We pit configs *ag
Loading reviews...