PeterLee6094 's Collections HF Daily
updated
Large Language Diffusion Models
Paper
• 2502.09992
• Published
• 126
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper
• 2502.10391
• Published
• 34
Diverse Inference and Verification for Advanced Reasoning
Paper
• 2502.09955
• Published
• 18
Selective Self-to-Supervised Fine-Tuning for Generalization in Large
Language Models
Paper
• 2502.08130
• Published
• 9
Jailbreaking to Jailbreak
Paper
• 2502.09638
• Published
• 6
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
Paper
• 2502.11089
• Published
• 168
ReLearn: Unlearning via Learning for Large Language Models
Paper
• 2502.11190
• Published
• 30
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on
Continual Pre-Training
Paper
• 2502.11196
• Published
• 23
CRANE: Reasoning with constrained LLM generation
Paper
• 2502.09061
• Published
• 21
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual
Reasoning in Mathematical LLMs
Paper
• 2502.10454
• Published
• 7
Dyve: Thinking Fast and Slow for Dynamic Process Verification
Paper
• 2502.11157
• Published
• 7
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated
Fact-Checking
Paper
• 2502.09083
• Published
• 4
Continuous Diffusion Model for Language Modeling
Paper
• 2502.11564
• Published
• 53
Rethinking Diverse Human Preference Learning through Principal Component
Analysis
Paper
• 2502.13131
• Published
• 37
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety
Guardrails in Large Language Models
Paper
• 2502.12464
• Published
• 28
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly
Possess Test-Time Scaling Capabilities?
Paper
• 2502.12215
• Published
• 16
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Paper
• 2502.12574
• Published
• 13
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
Paper
• 2502.12659
• Published
• 7
Injecting Domain-Specific Knowledge into Large Language Models: A
Comprehensive Survey
Paper
• 2502.10708
• Published
• 4
Qwen2.5-VL Technical Report
Paper
• 2502.13923
• Published
• 215
On the Trustworthiness of Generative Foundation Models: Guideline,
Assessment, and Perspective
Paper
• 2502.14296
• Published
• 45
Small Models Struggle to Learn from Strong Reasoners
Paper
• 2502.12143
• Published
• 39
LongPO: Long Context Self-Evolution of Large Language Models through
Short-to-Long Preference Optimization
Paper
• 2502.13922
• Published
• 27
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
• 2502.14499
• Published
• 194
From RAG to Memory: Non-Parametric Continual Learning for Large Language
Models
Paper
• 2502.14802
• Published
• 13
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper
• 2506.18254
• Published
• 32
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language
Models
Paper
• 2506.18369
• Published
• 2
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement
Learning
Paper
• 2506.18841
• Published
• 56
Phantom-Data : Towards a General Subject-Consistent Video Generation
Dataset
Paper
• 2506.18851
• Published
• 30
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
• 2506.18896
• Published
• 29
Robust Reward Modeling via Causal Rubrics
Paper
• 2506.16507
• Published
• 9
SRFT: A Single-Stage Method with Supervised and Reinforcement
Fine-Tuning for Reasoning
Paper
• 2506.19767
• Published
• 15
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
Paper
• 2506.20512
• Published
• 47
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper
• 2506.20495
• Published
• 10
MMSearch-R1: Incentivizing LMMs to Search
Paper
• 2506.20670
• Published
• 64
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Paper
• 2506.21506
• Published
• 52
Deep Researcher with Test-Time Diffusion
Paper
• 2507.16075
• Published
• 68
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI
Agents
Paper
• 2507.19478
• Published
• 33
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper
• 2507.19457
• Published
• 30
Agentic Reinforced Policy Optimization
Paper
• 2507.19849
• Published
• 158
A Survey of Self-Evolving Agents: On Path to Artificial Super
Intelligence
Paper
• 2507.21046
• Published
• 84
Geometric-Mean Policy Optimization
Paper
• 2507.20673
• Published
• 32
Goal Alignment in LLM-Based User Simulators for Conversational AI
Paper
• 2507.20152
• Published
• 5
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
Paper
• 2507.16806
• Published
• 7
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper
• 2507.21183
• Published
• 15
Persona Vectors: Monitoring and Controlling Character Traits in Language
Models
Paper
• 2507.21509
• Published
• 33
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced
Multimodal Reasoning
Paper
• 2507.22607
• Published
• 47
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Paper
• 2507.21802
• Published
• 19