MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU Paper • 2604.05091 • Published about 1 month ago • 45
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published about 1 month ago • 112