nicofains1/agentwatch
Observability for multi-agent systems. Cascade failure detection, heartbeats, cross-agent tracing, and forensic replay.
Observability for multi-agent systems. Track heartbeats, trace cross-agent actions, detect cascade failures, and replay what went wrong.
Built for teams running fleets of AI agents (CrewAI, AutoGen, LangGraph, PocketFlow, custom) who need to understand why Agent B failed after Agent A timed out.
No install needed. Run this and see a full cascade failure traced across 5 agents:
npx @nicofains1/agentwatch demoOutput:
AgentWatch Fleet Dashboard
============================================================
Agents: 5 total | 3 healthy | 1 degraded | 1 error | 0 offline
Cascade Failure (4 steps, root cause: scheduler/dispatch-batch)
============================================================
[ROOT] scheduler/dispatch-batch [ok] 15ms
|
[ 1 ] fetcher/call-api [error] 30000ms
TIMEOUT after 30000ms
|
[ 2 ] processor/transform [error] 120ms
Error: input is null - expected array from fetcher
|
[FAIL] notifier/send-alert [error] 8ms
Error: no processed data to reportnpm install @nicofains1/agentwatchimport { AgentWatch } from '@nicofains1/agentwatch';
const aw = new AgentWatch(); // creates agentwatch.db
// 1. Report heartbeats from your agents
aw.report('agent-a', 'healthy');
aw.report('agent-b', 'healthy');
// 2. Trace actions across agents
const traceId = aw.createTraceId();
const e1 = aw.trace(traceId, 'agent-a', 'fetch-data',
'url=https://api.example.com', 'rows=150');
const e2 = aw.trace(traceId, 'agent-b', 'process',
JSON.stringify({ rows: 150 }), 'Error: out of memory', {
parentEventId: e1.id,
status: 'error',
durationMs: 4200,
});
// 3. Find the root cause
const chain = aw.correlate(e2.id);
console.log(chain?.root_cause);
// -> { agent: 'agent-a', action: 'fetch-data', ... }
// 4. Fleet dashboard
console.log(aw.dashboardText());**Heartbeat registrati
Loading reviews...