VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation Paper • 2604.21375 • Published 9 days ago • 17
Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows Paper • 2604.20200 • Published 10 days ago • 5
Target-Oriented Pretraining Data Selection via Neuron-Activated Graph Paper • 2604.15706 • Published 15 days ago • 10
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw Paper • 2604.04759 • Published 26 days ago • 24
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published Mar 17 • 139
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published Nov 4, 2025 • 60
AHELM: A Holistic Evaluation of Audio-Language Models Paper • 2508.21376 • Published Aug 29, 2025 • 9