MemEval & PropMem: Standardizing Agent Memory Benchmarks

AsadIsmail · March 11, 2026, 9:35pm

Reproducing results across AI agent memory systems is hard, different LLMs, embeddings, token budgets, and scoring methods make comparisons almost meaningless.

We built MemEval, an open-source benchmark that evaluates memory systems under standardized conditions and tracks token efficiency. While benchmarking, we discovered recurring failure modes, which led to PropMem, a factual memory system designed to address them efficiently.

Both projects are Open Source: ready for evaluation, extension, or collaboration.

Try it out:

We would love to hear how the community benchmarks or improves agent memory systems!

Topic		Replies	Views
widemem: open-source memory layer for LLMs with importance scoring, decay, and conflict resolution Show and Tell	0	14	March 15, 2026
Ug report: meta-agents-research-environments/leaderboard Space runtime error Spaces	3	26	December 1, 2025
Custom BenchMark creation Intermediate	5	203	February 2, 2025
Associative Tokenized Memory Architecture Spaces	0	33	May 6, 2025
Understand challenges in testing of AI Agents. Introducing Vero (open-source) Research	0	34	November 15, 2025

MemEval & PropMem: Standardizing Agent Memory Benchmarks

Related topics