Papers to Read
updated
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper
• 2501.00192
• Published • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language
Pretraining
Paper
• 2501.00958
• Published • 109
Xmodel-2 Technical Report
Paper
• 2412.19638
• Published • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper
• 2412.18925
• Published • 107
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
• 2501.01257
• Published • 51
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
• 2501.08313
• Published • 302
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper
• 2501.09686
• Published • 41
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper
• 2501.10120
• Published • 55
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper
• 2501.18492
• Published • 88
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in
Post-Training
Paper
• 2501.18511
• Published • 20
LIMO: Less is More for Reasoning
Paper
• 2502.03387
• Published • 62
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
• 2502.06703
• Published • 153
Expect the Unexpected: FailSafe Long Context QA for Finance
Paper
• 2502.06329
• Published • 133
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation
Paper
• 2502.07870
• Published • 45
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
• 2502.07374
• Published • 40
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance
Paper
• 2502.08127
• Published • 59
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large
Language Models
Paper
• 2502.07346
• Published • 53
TransMLA: Multi-head Latent Attention Is All You Need
Paper
• 2502.07864
• Published • 57
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of
Video Foundation Model
Paper
• 2502.10248
• Published • 57
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance
Software Engineering?
Paper
• 2502.12115
• Published • 46
Magma: A Foundation Model for Multimodal AI Agents
Paper
• 2502.13130
• Published • 58
Qwen2.5-VL Technical Report
Paper
• 2502.13923
• Published • 217
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
• 2502.14499
• Published • 195
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic
Understanding, Localization, and Dense Features
Paper
• 2502.14786
• Published • 161
S*: Test Time Scaling for Code Generation
Paper
• 2502.14382
• Published • 63
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Paper
• 2502.14739
• Published • 110
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
• 2503.07365
• Published • 61
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper
• 2503.04130
• Published • 96
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Paper
• 2503.05132
• Published • 57
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
• 2503.01785
• Published • 86
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language
Models via Mixture-of-LoRAs
Paper
• 2503.01743
• Published • 89
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through
Two-Stage Rule-Based RL
Paper
• 2503.07536
• Published • 88
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural
Vision-Language Dataset for Southeast Asia
Paper
• 2503.07920
• Published • 101
Unified Reward Model for Multimodal Understanding and Generation
Paper
• 2503.05236
• Published • 124
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Paper
• 2503.11579
• Published • 21
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
for Visual Generation and Editing
Paper
• 2503.10639
• Published • 53
R1-Onevision: Advancing Generalized Multimodal Reasoning through
Cross-Modal Formalization
Paper
• 2503.10615
• Published • 17
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Paper
• 2503.10291
• Published • 36
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based
Scientific Research
Paper
• 2503.13399
• Published • 22
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Paper
• 2503.11495
• Published • 14
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Paper
• 2503.13444
• Published • 20
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Paper
• 2503.14478
• Published • 48
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs
for Knowledge-Intensive Visual Grounding
Paper
• 2503.12797
• Published • 32
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal
Consistent Video Generation
Paper
• 2503.06053
• Published • 138
TULIP: Towards Unified Language-Image Pretraining
Paper
• 2503.15485
• Published • 49
Stop Overthinking: A Survey on Efficient Reasoning for Large Language
Models
Paper
• 2503.16419
• Published • 77
Video-T1: Test-Time Scaling for Video Generation
Paper
• 2503.18942
• Published • 90
Video SimpleQA: Towards Factuality Evaluation in Large Video Language
Models
Paper
• 2503.18923
• Published • 14
Reasoning to Learn from Latent Thoughts
Paper
• 2503.18866
• Published • 13
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
Paper
• 2503.19990
• Published • 35
Qwen2.5-Omni Technical Report
Paper
• 2503.20215
• Published • 172
Scaling Vision Pre-Training to 4K Resolution
Paper
• 2503.19903
• Published • 41
CoLLM: A Large Language Model for Composed Image Retrieval
Paper
• 2503.19910
• Published • 15
Exploring Hallucination of Large Multimodal Models in Video
Understanding: Benchmark, Analysis and Mitigation
Paper
• 2503.19622
• Published • 31
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper
• 2503.19325
• Published • 73
MDocAgent: A Multi-Modal Multi-Agent Framework for Document
Understanding
Paper
• 2503.13964
• Published • 20
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time
Thinking
Paper
• 2503.19855
• Published • 29
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Paper
• 2503.18931
• Published • 30
Defeating Prompt Injections by Design
Paper
• 2503.18813
• Published • 24
Wan: Open and Advanced Large-Scale Video Generative Models
Paper
• 2503.20314
• Published • 60
Gemini Robotics: Bringing AI into the Physical World
Paper
• 2503.20020
• Published • 31
Video-R1: Reinforcing Video Reasoning in MLLMs
Paper
• 2503.21776
• Published • 79
Large Language Model Agent: A Survey on Methodology, Applications and
Challenges
Paper
• 2503.21460
• Published • 83
ResearchBench: Benchmarking LLMs in Scientific Discovery via
Inspiration-Based Task Decomposition
Paper
• 2503.21248
• Published • 21
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for
Embodied Interactive Tasks
Paper
• 2503.21696
• Published • 23
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
• 2503.21614
• Published • 43
Your ViT is Secretly an Image Segmentation Model
Paper
• 2503.19108
• Published • 25
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large
Language Models
Paper
• 2503.24235
• Published • 55
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published • 62
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist
Policy
Paper
• 2503.24388
• Published • 29
Any2Caption:Interpreting Any Condition to Caption for Controllable Video
Generation
Paper
• 2503.24379
• Published • 76
JudgeLRM: Large Reasoning Models as a Judge
Paper
• 2504.00050
• Published • 62
Exploring the Effect of Reinforcement Learning on Video Understanding:
Insights from SEED-Bench-R1
Paper
• 2503.24376
• Published • 38
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal
LLMs on Academic Resources
Paper
• 2504.00595
• Published • 37
Z1: Efficient Test-time Scaling with Code
Paper
• 2504.00810
• Published • 27
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for
Large Language Models
Paper
• 2503.24377
• Published • 18
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Paper
• 2504.00883
• Published • 67
Understanding R1-Zero-Like Training: A Critical Perspective
Paper
• 2503.20783
• Published • 59
PaperBench: Evaluating AI's Ability to Replicate AI Research
Paper
• 2504.01848
• Published • 37
Advances and Challenges in Foundation Agents: From Brain-Inspired
Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper
• 2504.01990
• Published • 305
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image
Generation
Paper
• 2504.02782
• Published • 57
Rethinking RL Scaling for Vision Language Models: A Transparent,
From-Scratch Framework and Comprehensive Evaluation Scheme
Paper
• 2504.02587
• Published • 32
MedSAM2: Segment Anything in 3D Medical Images and Videos
Paper
• 2504.03600
• Published • 10
SmolVLM: Redefining small and efficient multimodal models
Paper
• 2504.05299
• Published • 207
One-Minute Video Generation with Test-Time Training
Paper
• 2504.05298
• Published • 110
Rethinking Reflection in Pre-Training
Paper
• 2504.04022
• Published • 80
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning
(v1)
Paper
• 2504.03151
• Published • 15
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
Paper
• 2504.05599
• Published • 85
DDT: Decoupled Diffusion Transformer
Paper
• 2504.05741
• Published • 77
OmniCaptioner: One Captioner to Rule Them All
Paper
• 2504.07089
• Published • 20
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement
Fine-Tuning
Paper
• 2504.06958
• Published • 13
Are We Done with Object-Centric Learning?
Paper
• 2504.07092
• Published • 6
Paper
• 2504.07491
• Published • 138
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
• 2504.07128
• Published • 87
VCR-Bench: A Comprehensive Evaluation Framework for Video
Chain-of-Thought Reasoning
Paper
• 2504.07956
• Published • 46
MM-IFEngine: Towards Multimodal Instruction Following
Paper
• 2504.07957
• Published • 35
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Paper
• 2504.08685
• Published • 130
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for
Autoregressive Image Generation
Paper
• 2504.08736
• Published • 46
FUSION: Fully Integration of Vision-Language Representations for Deep
Cross-Modal Understanding
Paper
• 2504.09925
• Published • 39
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Paper
• 2504.10479
• Published • 308
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models
with Reinforcement Learning
Paper
• 2504.08837
• Published • 44
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
Paper
• 2504.09641
• Published • 16
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
• 2504.10481
• Published • 85
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
• 2504.08672
• Published • 55
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Paper
• 2504.10465
• Published • 27
Efficient Reasoning Models: A Survey
Paper
• 2504.10903
• Published • 21
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for
Language Model Pre-training
Paper
• 2504.13161
• Published • 97
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference
Optimization for Large Video Models
Paper
• 2504.13122
• Published • 20
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
• 2504.11536
• Published • 63
ToolRL: Reward is All Tool Learning Needs
Paper
• 2504.13958
• Published • 49
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal
Large Language Models
Paper
• 2504.15279
• Published • 78
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published • 121
Describe Anything: Detailed Localized Image and Video Captioning
Paper
• 2504.16072
• Published • 64
Eagle 2.5: Boosting Long-Context Post-Training for Frontier
Vision-Language Models
Paper
• 2504.15271
• Published • 67
Paper2Code: Automating Code Generation from Scientific Papers in Machine
Learning
Paper
• 2504.17192
• Published • 124
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning
in Multimodal LLMs
Paper
• 2504.15415
• Published • 23
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
Paper
• 2504.15521
• Published • 64
Reinforcement Learning for Reasoning in Large Language Models with One
Training Example
Paper
• 2504.20571
• Published • 98
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement
Learning
Paper
• 2505.02835
• Published • 28
RM-R1: Reward Modeling as Reasoning
Paper
• 2505.02387
• Published • 81
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level
and Token-level CoT
Paper
• 2505.00703
• Published • 44
100 Days After DeepSeek-R1: A Survey on Replication Studies and More
Directions for Reasoning Language Models
Paper
• 2505.00551
• Published • 36
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language
Models in Math
Paper
• 2504.21233
• Published • 49
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement
Fine-Tuning
Paper
• 2505.03318
• Published • 94
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published • 191
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Paper
• 2505.04588
• Published • 65
When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following
in LLMs
Paper
• 2505.11423
• Published
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop
System from Hypothesis to Verification
Paper
• 2505.16938
• Published • 121
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with
Curiosity-Driven Reinforcement Learning
Paper
• 2505.15966
• Published • 53
Think or Not? Selective Reasoning via Reinforcement Learning for
Vision-Language Models
Paper
• 2505.16854
• Published • 11
GRIT: Teaching MLLMs to Think with Images
Paper
• 2505.15879
• Published • 13
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
Paper
• 2503.20752
• Published • 1
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models
Paper
• 2504.11468
• Published • 30
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with
Reinforcement Learning
Paper
• 2505.14677
• Published • 15
Emerging Properties in Unified Multimodal Pretraining
Paper
• 2505.14683
• Published • 134
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement
Learning
Paper
• 2505.14231
• Published • 53
Visual Agentic Reinforcement Fine-Tuning
Paper
• 2505.14246
• Published • 32
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via
Reinforcement Learning to Rank
Paper
• 2505.14460
• Published • 33
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Paper
• 2505.11049
• Published • 61
Visual Planning: Let's Think Only with Images
Paper
• 2505.11409
• Published • 57
OpenThinkIMG: Learning to Think with Images via Visual Tool
Reinforcement Learning
Paper
• 2505.08617
• Published • 42
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large
Reasoning Models
Paper
• 2505.10554
• Published • 120
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture,
Training and Dataset
Paper
• 2505.09568
• Published • 99
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Paper
• 2505.04410
• Published • 44
Bring Reason to Vision: Understanding Perception and Reasoning through
Model Merging
Paper
• 2505.05464
• Published • 11
Perception, Reason, Think, and Plan: A Survey on Large Multimodal
Reasoning Models
Paper
• 2505.04921
• Published • 187
Fin-R1: A Large Language Model for Financial Reasoning through
Reinforcement Learning
Paper
• 2503.16252
• Published • 31
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial
Intelligence
Paper
• 2505.23747
• Published • 69
Table-R1: Inference-Time Scaling for Table Reasoning
Paper
• 2505.23621
• Published • 93
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
• 2505.22617
• Published • 132
FAMA: The First Large-Scale Open-Science Speech Foundation Model for
English and Italian
Paper
• 2505.22759
• Published • 19
D-AR: Diffusion via Autoregressive Models
Paper
• 2505.23660
• Published • 34
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV
Cache and Parallel Decoding
Paper
• 2505.22618
• Published • 45
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Paper
• 2505.22651
• Published • 48
Skywork Open Reasoner 1 Technical Report
Paper
• 2505.22312
• Published • 54
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper
• 2505.22453
• Published • 46
Advancing Multimodal Reasoning via Reinforcement Learning with Cold
Start
Paper
• 2505.22334
• Published • 36
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
Paper
• 2505.21327
• Published • 83
Paper2Poster: Towards Multimodal Poster Automation from Scientific
Papers
Paper
• 2505.21497
• Published • 109
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic
Scientific Workflows
Paper
• 2505.19897
• Published • 104
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks
Paper
• 2505.16459
• Published • 45
BizFinBench: A Business-Driven Real-World Financial Benchmark for
Evaluating LLMs
Paper
• 2505.19457
• Published • 64
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
• 2505.17667
• Published • 88
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper
• 2505.18129
• Published • 62
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement
Learning
Paper
• 2505.16410
• Published • 58
Paper
• 2506.03569
• Published • 80
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged
Reinforcement Learning
Paper
• 2506.04207
• Published • 48
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
• 2505.24726
• Published • 279
VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in
Multi-Agent Environments
Paper
• 2506.02387
• Published • 58
FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Paper
• 2505.24714
• Published • 37
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications
of Agentic AI
Paper
• 2505.19443
• Published • 15
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language
Models for Robotics
Paper
• 2506.04308
• Published • 43
Reinforcement Pre-Training
Paper
• 2506.08007
• Published • 265
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic
Sampling
Paper
• 2506.08672
• Published • 30
Geopolitical biases in LLMs: what are the "good" and the "bad" countries
according to contemporary language models
Paper
• 2506.06751
• Published • 71
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical
Reasoning
Paper
• 2506.09513
• Published • 102
Scientists' First Exam: Probing Cognitive Abilities of MLLM via
Perception, Understanding, and Reasoning
Paper
• 2506.10521
• Published • 73
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark
for Financial LLM Evaluation
Paper
• 2506.14028
• Published • 93
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction
and Planning
Paper
• 2506.09985
• Published • 31
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper
• 2506.16406
• Published • 132
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper
• 2506.06395
• Published • 135
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math
Reasoning
Paper
• 2506.09736
• Published • 9
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark
Paper
• 2506.10960
• Published • 12
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
• 2507.00432
• Published • 79
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning
Dataset
Paper
• 2507.03483
• Published • 24
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and
Future Frontiers
Paper
• 2506.23918
• Published • 90
MemOS: A Memory OS for AI System
Paper
• 2507.03724
• Published • 165
Scaling RL to Long Videos
Paper
• 2507.07966
• Published • 162
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation
from Diffusion Models
Paper
• 2507.07104
• Published • 46
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for
Visual Reasoning
Paper
• 2507.05255
• Published • 75
Vision Foundation Models as Effective Visual Tokenizers for
Autoregressive Image Generation
Paper
• 2507.08441
• Published • 62
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published • 263
VisionThink: Smart and Efficient Vision Language Model via Reinforcement
Learning
Paper
• 2507.13348
• Published • 79
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent
Planning
Paper
• 2507.16815
• Published • 42
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via
Context-Aware Multi-Stage Policy Optimization
Paper
• 2507.14683
• Published • 136
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper
• 2507.16746
• Published • 34
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
• 2507.16784
• Published • 123
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced
Multimodal Reasoning
Paper
• 2507.22607
• Published • 47
A Survey of Self-Evolving Agents: On Path to Artificial Super
Intelligence
Paper
• 2507.21046
• Published • 85
Agentic Reinforced Policy Optimization
Paper
• 2507.19849
• Published • 160
SmallThinker: A Family of Efficient Large Language Models Natively
Trained for Local Deployment
Paper
• 2507.20984
• Published • 58
Group Sequence Policy Optimization
Paper
• 2507.18071
• Published • 320
Captain Cinema: Towards Short Movie Generation
Paper
• 2507.18634
• Published • 42
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science
Reasoning
Paper
• 2507.16812
• Published • 64
Intern-S1: A Scientific Multimodal Foundation Model
Paper
• 2508.15763
• Published • 272
A Survey on Large Language Model Benchmarks
Paper
• 2508.15361
• Published • 19
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm
Bridging Foundation Models and Lifelong Agentic Systems
Paper
• 2508.07407
• Published • 99
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
• 2508.06471
• Published • 207