DySCO: Dynamic Attention-Scaling Decoding for Long-Context Language Models Paper • 2602.22175 • Published 11 days ago • 1
Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion Paper • 2604.05688 • Published 20 days ago • 1
NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training Paper • 2603.03597 • Published Mar 4 • 1
Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers Paper • 2602.06079 • Published Feb 4 • 20
RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs Paper • 2602.05367 • Published Feb 5 • 8
Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math Paper • 2602.06291 • Published Feb 6 • 24
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization Paper • 2506.13331 • Published Jun 16, 2025 • 2
The Dual-Stream Transformer: Channelized Architecture for Interpretable Language Modeling Paper • 2603.07461 • Published Mar 8 • 2
JEPA-Reasoner: Decoupling Latent Reasoning from Token Generation Paper • 2512.19171 • Published Dec 22, 2025 • 3
A Neuroscience-Inspired Dual-Process Model of Compositional Generalization Paper • 2507.18868 • Published Jul 25, 2025 • 2
Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published 25 days ago • 46
H2LooP Spark Preview: Continual Pretraining of Large Language Models for Low-Level Embedded Systems Code Paper • 2603.11139 • Published Mar 13 • 1
Raising Bars, Not Parameters: LilMoo Compact Language Model for Hindi Paper • 2603.03508 • Published Mar 3 • 4