Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper ⢠2502.12982 ⢠Published Feb 18, 2025 ⢠19
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper ⢠2509.25531 ⢠Published Sep 29, 2025 ⢠10
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper ⢠2509.25531 ⢠Published Sep 29, 2025 ⢠10
Survivor Library Books - OCR Collection Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs ⢠2 items ⢠Updated Jul 14, 2025 ⢠5
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs Paper ⢠2507.02778 ⢠Published Jul 3, 2025 ⢠9