gangj277/hack-your-agent
The native red-team skill for Codex and Claude Code. Finds prompt injection, MCP poisoning, memory poisoning, and concealment bugs with forensic evidence.
Platform-specific configuration:
{
"mcpServers": {
"hack-your-agent": {
"command": "npx",
"args": [
"-y",
"hack-your-agent"
]
}
}
}Add the config above to .claude/settings.json under the mcpServers key.
Red-team Codex and Claude Code agents for prompt injection, MCP poisoning, memory poisoning, and concealed side effects.
HackYourAgent is a manual-use skill bundle for coding agents. It teaches an agent to map an authorized AI system, generate paired control and attack trials, inspect outputs one by one, and leave behind evidence, regressions, and hardening actions a builder can actually commit.
Most AI security tooling still looks like one of these:
HackYourAgent is the narrow wedge for builders using coding agents. It is designed to run inside Codex and Claude Code workflows, inspect repo-local trust boundaries, and tell you where prompt injection, tool poisoning, memory poisoning, approval confusion, or concealment still work.
redteam/Install the skill:
python3 scripts/install_skill.py bothPick a seeded example:
examples/vulnerable-rag-agentexamples/vulnerable-mcp-agentexamples/vulnerable-concealment-agentInvoke the skill:
Use $hack-your-agent on examples/vulnerable-rag-agent.
Write only to redteam/ artifacts. Build a paired control/attack trial matrix,
inspect outputs one by one, and leave minimal repros and regressions.Expected outcome:
redteam/trials/trial-matrix.csvredteam/trials/redteam/evidence/redteam/findings/redteam/hardening-plan.mdLoading reviews...