vexp-swe-bench

MCP Tool

Vexp-ai/vexp-swe-bench

Open benchmark for AI coding agents on SWE-bench Verified. Compare resolution rates, cost, and unique wins.

Install

$ npx loaditout add Vexp-ai/vexp-swe-bench

Platform-specific configuration:

.claude/settings.json

{
  "mcpServers": {
    "vexp-swe-bench": {
      "command": "npx",
      "args": [
        "-y",
        "vexp-swe-bench"
      ]
    }
  }
}

Add the config above to .claude/settings.json under the mcpServers key.

About

vexp-swe-bench

The open benchmark for AI coding agents — compare resolution rates, cost, and speed on real-world GitHub issues from SWE-bench Verified.

Benchmark any coding agent (Claude Code, Codex, Cursor, Augment, Windsurf, OpenHands, and more) on a curated 100-task subset of SWE-bench Verified. Captures pass@1 resolution rates, cost per task, duration, and token usage.

Default configuration: Claude Code + [vexp](https://vexp.dev) — context-aware code intelligence that delivers the highest resolution rate at the lowest cost per task.

Results

Evaluated on a 100-task subset of SWE-bench Verified. All agents use Claude Opus 4.5 for a fair, apples-to-apples comparison.

| Agent | Pass@1 | $/task | Unique Wins | |-------|--------|--------|-------------| | vexp + Claude Code | 73.0% | $0.67 | 7–10 | | Live-SWE-Agent | 72.0% | $0.86 | — | | OpenHands | 70.0% | $1.77 | — | | Sonar Foundation | 70.0% | $1.98 | — |

> vexp resolves more issues at the lowest cost per task — 22% cheaper than the next best agent.

Generate comparison charts: node dist/cli.js compare results/swebench-2026-03-22.jsonl

External resolution data sourced from swe-bench/experiments. Cost data sourced from each agent's published benchmarks (see data sources below).

Quick Start

git clone https://github.com/Vexp-ai/vexp-swe-bench.git
cd vexp-swe-bench

# One command setup (Python >= 3.10, Node >= 18, Git required)
./setup.sh

# Run the benchmark
source .venv/bin/activate
node dist/cli.js run

The setup script handles Node dependencies, Python venv, pip packages, SWE-bench Verified dataset download, 100-task subset generation, and TypeScript build.

> Note: vexp Pro or Team plan is required to run with vexp enabled. The CLI will prompt you to activate a license at first run. Use code BENCHMARK at vexp.dev/#pricing f

Reviews

Loading reviews...

Quality Signals

Stars

Installs

Last updated26 days ago

Security: AREADME

Safety

Risk Levelmedium

Data Access

read

Network Accessnone

Details

Sourcegithub-crawl

Last commit3/22/2026

View on GitHub→

Embed Badge

[![Loaditout](https://loaditout.ai/api/badge/Vexp-ai/vexp-swe-bench)](https://loaditout.ai/skills/Vexp-ai/vexp-swe-bench)

vexp-swe-bench

MCP Tool

Vexp-ai/vexp-swe-bench

Open benchmark for AI coding agents on SWE-bench Verified. Compare resolution rates, cost, and unique wins.

Install

$ npx loaditout add Vexp-ai/vexp-swe-bench

Platform-specific configuration:

.claude/settings.json

{
  "mcpServers": {
    "vexp-swe-bench": {
      "command": "npx",
      "args": [
        "-y",
        "vexp-swe-bench"
      ]
    }
  }
}

Add the config above to .claude/settings.json under the mcpServers key.

About

vexp-swe-bench

The open benchmark for AI coding agents — compare resolution rates, cost, and speed on real-world GitHub issues from SWE-bench Verified.

Default configuration: Claude Code + [vexp](https://vexp.dev) — context-aware code intelligence that delivers the highest resolution rate at the lowest cost per task.

Results

Evaluated on a 100-task subset of SWE-bench Verified. All agents use Claude Opus 4.5 for a fair, apples-to-apples comparison.

> vexp resolves more issues at the lowest cost per task — 22% cheaper than the next best agent.

Generate comparison charts: node dist/cli.js compare results/swebench-2026-03-22.jsonl

External resolution data sourced from swe-bench/experiments. Cost data sourced from each agent's published benchmarks (see data sources below).

Quick Start

git clone https://github.com/Vexp-ai/vexp-swe-bench.git
cd vexp-swe-bench

# One command setup (Python >= 3.10, Node >= 18, Git required)
./setup.sh

# Run the benchmark
source .venv/bin/activate
node dist/cli.js run

The setup script handles Node dependencies, Python venv, pip packages, SWE-bench Verified dataset download, 100-task subset generation, and TypeScript build.

> Note: vexp Pro or Team plan is required to run with vexp enabled. The CLI will prompt you to activate a license at first run. Use code BENCHMARK at vexp.dev/#pricing f

Reviews

Loading reviews...

Quality Signals

Stars

Installs

Last updated26 days ago

Security: AREADME

Safety

Risk Levelmedium

Data Access

read

Network Accessnone

Details

Sourcegithub-crawl

Last commit3/22/2026

View on GitHub→

Embed Badge

[![Loaditout](https://loaditout.ai/api/badge/Vexp-ai/vexp-swe-bench)](https://loaditout.ai/skills/Vexp-ai/vexp-swe-bench)

vexp-swe-bench

Install

About

Tags

Reviews

Quality Signals

Safety

Details

Embed Badge

vexp-swe-bench

Install

About

Tags

Reviews

Quality Signals

Safety

Details

Embed Badge