chenglin1112/AgentTrust
Real-time trustworthiness evaluation and safety interception for AI agents. Semantic analysis, safe alternative suggestions, multi-step attack chain detection, and LLM-as-Judge.
Platform-specific configuration:
{
"mcpServers": {
"AgentTrust": {
"command": "npx",
"args": [
"-y",
"AgentTrust"
]
}
}
}Add the config above to .claude/settings.json under the mcpServers key.
<div align="center">
Real-time trustworthiness evaluation and safety interception for AI agents.
The first framework that understands, judges, suggests, and tracks agent actions — before they execute.
[](https://www.python.org/downloads/) [](LICENSE) [](https://github.com/chenglin1112/AgentTrust/actions) [](https://github.com/chenglin1112/AgentTrust)
42 risk patterns | 21 policy rules | 37 SafeFix rules | 7 chain detectors | 300 benchmark scenarios | 95 tests | < 1ms latency
Quick Start | Architecture | SafeFix | RiskChain | Benchmark | Docs
</div>
---
AI agents execute real-world actions: file operations, shell commands, API calls, database queries. A single misjudged action — an accidental rm -rf /, an exposed API key, or silent data exfiltration through a benign-looking HTTP call — can cause irreversible damage.
Existing solutions fall short:
graph LR
A["Post-hoc Benchmarks<br/>(AgentHarm, TrustBench)"] -.->|"Too late<br/>Damage already done"| X["GAP"]
B["Rule-based Guardrails<br/>(Invariant, NeMo)"] -.->|"Too shallow<br/>Miss semantic context"| X
C["Infrastructure Sandboxes<br/>(OpenShell)"] -.->|"Too low-level<br/>Don't understand intent"| X
X ==>|"AgentTrust fills this"| D["Real-time<br/>Semantic<br/>Explainable"]
style X fill:#ff6b6b,stroke:#c0392b,color:#fff
style D fill:#2ecc71,stroke:#27ae60,color:#fffAgentTrust provides **real-time, semantic-level safety
Loading reviews...