Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published Sep 4 • 75 • 7
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29 • 32 • 5
A Refined Analysis of Massive Activations in LLMs Paper • 2503.22329 • Published Mar 28 • 14 • 3