view article Article Room360: Video-to-3D Spatial Reconstruction Platform build-small-hackathon • 5 days ago • 3
view article Article Taking Alpamayo to New Heights with Driving Foundation Models and Closed-Loop Training drmapavone • 11 days ago • 13
view article Article Scaling Mixture of Experts: Architecture Search for Billion-Parameter Language Models kshitijthakkar • Feb 9 • 2
view article Article Systematic Architecture Search for Mobile-Optimized Mixture of Experts Language Models kshitijthakkar • Feb 6 • 2
view article Article NEO-unify: Building Native Multimodal Unified Models End to End sensenova • Mar 5 • 164
view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 168
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective LinkedIn • Jan 27 • 77
Parallel Sentences Datasets Collection These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual. • 14 items • Updated Dec 10, 2025 • 23
view article Article The Optimal Architecture for Small Language Models codelion • Dec 26, 2025 • 121
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 59