Tsubaki414/mcp-tool-selection-benchmark-v2
Benchmark measuring agent tool selection failure rates across 1817 real MCP tools. Tests Claude Sonnet 4 & GPT-4o with diagnosable failure taxonomy and targeted fixes.
Platform-specific configuration:
{
"mcpServers": {
"mcp-tool-selection-benchmark-v2": {
"command": "npx",
"args": [
"-y",
"mcp-tool-selection-benchmark-v2"
]
}
}
}Add the config above to .claude/settings.json under the mcpServers key.
Loading reviews...