TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents Paper ⢠2602.02196 ⢠Published 17 days ago ⢠33
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper ⢠2602.05843 ⢠Published 14 days ago ⢠57
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper ⢠2602.05843 ⢠Published 14 days ago ⢠57
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper ⢠2602.05843 ⢠Published 14 days ago ⢠57
MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper ⢠2507.14958 ⢠Published Jul 20, 2025 ⢠47
A Semantic Mention Graph Augmented Model for Document-Level Event Argument Extraction Paper ⢠2403.09721 ⢠Published Mar 12, 2024 ⢠1
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning Paper ⢠2507.21892 ⢠Published Jul 29, 2025 ⢠3
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding Paper ⢠2505.19076 ⢠Published May 25, 2025
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper ⢠2510.24411 ⢠Published Oct 28, 2025 ⢠72
$A^3$-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation Paper ⢠2601.09274 ⢠Published Jan 14 ⢠84
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent Paper ⢠2601.07779 ⢠Published Jan 12 ⢠28
SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization Paper ⢠2601.22491 ⢠Published 21 days ago ⢠12
TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents Paper ⢠2602.02196 ⢠Published 17 days ago ⢠33
Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over Text Paper ⢠2301.02983 ⢠Published Jan 8, 2023
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper ⢠2602.05843 ⢠Published 14 days ago ⢠57
TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents Paper ⢠2602.02196 ⢠Published 17 days ago ⢠33
$Ļ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Paper ⢠2503.13288 ⢠Published Mar 17, 2025 ⢠51