Ministral 3 Collection A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated 5 days ago • 115
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 7 days ago • 224
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published 10 days ago • 152
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix Nov 3 • 47
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models Paper • 2411.05007 • Published Nov 7, 2024 • 22
view article Article From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels Aug 18 • 88
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14 • 144
TempFlow-GRPO: When Timing Matters for GRPO in Flow Models Paper • 2508.04324 • Published Aug 6 • 11
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper • 2508.10975 • Published Aug 14 • 60
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models Paper • 2508.09834 • Published Aug 13 • 53
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21 • 398
view article Article NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks Aug 11 • 75
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published Aug 8 • 192