-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2412.09871
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 83 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 29
-
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification
Paper • 2509.15591 • Published • 45 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 93 -
Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost
Paper • 2602.03120 • Published • 1 -
TADA! Tuning Audio Diffusion Models through Activation Steering
Paper • 2602.11910 • Published • 2
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 288 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 161 -
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper • 2412.10360 • Published • 147
-
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 161 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 377 -
Are Your LLMs Capable of Stable Reasoning?
Paper • 2412.13147 • Published • 93 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification
Paper • 2509.15591 • Published • 45 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 93 -
Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost
Paper • 2602.03120 • Published • 1 -
TADA! Tuning Audio Diffusion Models through Activation Steering
Paper • 2602.11910 • Published • 2
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 83 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 29
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 288 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 161 -
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper • 2412.10360 • Published • 147
-
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 161 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 377 -
Are Your LLMs Capable of Stable Reasoning?
Paper • 2412.13147 • Published • 93 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108