InnoGym: Benchmarking the Innovation Potential of AI Agents Paper • 2512.01822 • Published 9 days ago • 33
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21 • 110
OceanGym: A Benchmark Environment for Underwater Embodied Agents Paper • 2509.26536 • Published Sep 30 • 34
Towards Personalized Deep Research: Benchmarks and Evaluations Paper • 2509.25106 • Published Sep 29 • 29
A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures Paper • 2506.19676 • Published Jun 24
ReCode: Updating Code API Knowledge with Reinforcement Learning Paper • 2506.20495 • Published Jun 25 • 9
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study Paper • 2506.19794 • Published Jun 24 • 8
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality Paper • 2506.19807 • Published Jun 24 • 7
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science Paper • 2506.10974 • Published Jun 12 • 19
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark Paper • 2506.10960 • Published Jun 12 • 12
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms Paper • 2505.20322 • Published May 23 • 14
BiasEdit: Debiasing Stereotyped Language Models via Model Editing Paper • 2503.08588 • Published Mar 11 • 7