-
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Paper • 2412.11605 • Published • 18 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 41 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10
Collections
Discover the best community collections!
Collections including paper arxiv:2504.10559
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 93 -
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 23 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 26 -
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 30
-
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 75 -
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Paper • 2504.11393 • Published • 18 -
Efficient Process Reward Model Training via Active Learning
Paper • 2504.10559 • Published • 13 -
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Paper • 2504.13161 • Published • 93
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117
-
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Paper • 2412.11605 • Published • 18 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 41 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10
-
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 75 -
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Paper • 2504.11393 • Published • 18 -
Efficient Process Reward Model Training via Active Learning
Paper • 2504.10559 • Published • 13 -
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Paper • 2504.13161 • Published • 93
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 93 -
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 23 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 26 -
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 30
-
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117