DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17, 2024 • 55
Open Deep Search: Democratizing Search with Open-source Reasoning Agents Paper • 2503.20201 • Published Mar 26, 2025 • 48
SuperBPE Collection SuperBPE tokenizers and models trained with them • 9 items • Updated 30 days ago • 17