riya0920/self-healing-rl-pipeline
Platform-specific configuration:
{
"mcpServers": {
"self-healing-rl-pipeline": {
"command": "npx",
"args": [
"-y",
"self-healing-rl-pipeline"
]
}
}
}Add the config above to .claude/settings.json under the mcpServers key.
A reinforcement learning content recommendation system that autonomously detects when its recommendations start failing, diagnoses the root cause, retrains itself, and verifies the fix ā using MCP for tool access and A2A for multi-agent coordination.
RL recommendation agents are trained on historical user behavior. When user preferences shift ā new topics trend, seasonal changes occur, or the content distribution changes ā the agent's learned policy becomes stale. In production, this means bad recommendations, dropping engagement, and lost revenue.
Most systems rely on humans to notice the degradation, diagnose the issue, and manually retrain. This system does it autonomously.
A Deep Q-Network (DQN) learns which content categories to recommend to maximize user engagement. Trained on Reddit post data from subreddits like r/technology, r/sports, r/politics, r/science.
When the content stream shifts to domains the agent has never seen (r/cooking, r/fitness, r/legaladvice), the agent's recommendations become irrelevant. Reward drops. Relevance tanks.
Live Reddit Posts ā RL Agent serves recommendations
ā all interactions logged to SQLite
Monitor Agent ā watches reward curves, detects engagement drops
ā (drift detected)
Diagnostics Agent ā analyzes logs via MCP: "75% OOD posts detected"
ā (root cause identified)
Repair Agent ā retrains DQN on clean training data, deploys new version
ā (fix applied)
Verification Agent ā validates new model, approves or sends backšØ Monitor detected 5 drift signals:
ā ļø Reward dropped by 0.3600 (threshold: 0.25)
ā ļø Relevance rate: 12.5% (threshold: 30.0%)
ā ļø 90% low reward recommendations
ā ļø 75% out-of-domain posts from non-trainingLoading reviews...