JingbiaoMei/ATM-Bench
ATM-Bench: A benchmark for long-term personalized memory QA spanning ~4 years of multimodal data (images, videos, emails). Features referential queries, evidence-grounded answering, and multi-source reasoning. Paper: "According to Me: Long-Term Personalized Referential Memory QA"
Platform-specific configuration:
{
"mcpServers": {
"ATM-Bench": {
"command": "npx",
"args": [
"-y",
"ATM-Bench"
]
}
}
}Add the config above to .claude/settings.json under the mcpServers key.
Loading reviews...