MemEval & PropMem: Standardizing Agent Memory Benchmarks

Reproducing results across AI agent memory systems is hard, different LLMs, embeddings, token budgets, and scoring methods make comparisons almost meaningless.

We built MemEval, an open-source benchmark that evaluates memory systems under standardized conditions and tracks token efficiency. While benchmarking, we discovered recurring failure modes, which led to PropMem, a factual memory system designed to address them efficiently.

Both projects are Open Source: ready for evaluation, extension, or collaboration.

Try it out:

We would love to hear how the community benchmarks or improves agent memory systems!

1 Like